Compressive Sensing over the Grassmann 
Manifold: a Unified Geometric Framework 

Weiyu Xu and Babak Hassibi 



O 
(N, 
^1 Abstract 

i2 , 

£i minimization is often used for finding the sparse solutions of an under-determined linear system. 
^D ' In this paper we focus on finding sharp performance bounds on recovering approximately sparse signals 
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using ii minimization, possibly under noisy measurements. While the restricted isometry property is 
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powerful for the analysis of recovering approximately sparse signals with noisy measurements, the 
known bounds on the achievable sparsity' level can be quite loose. The neighborly polytope analysis 
which yields sharp bounds for ideally sparse signals cannot be readily generalized to approximately 
sparse signals. Starting from a necessary and sufficient condition, the "balancedness" property of linear 
subspaces, for achieving a certain signal recovery accuracy, we give a unified null space Grassmann 
^N ■ angle-based geometric framework for analyzing the performance of £i minimization. By investigating the 

CO ' "balancedness" property, this unified framework characterizes sharp quantitative tradeoffs between the 

^C* [ considered sparsity and the recovery accuracy of the ii optimization. As a consequence, this generalizes 

^D . the neighborly polytope result for ideally sparse signals. Besides the robustness in the "strong" sense 

for all sparse signals, we also discuss the notions of "weak" and "sectional" robustness. Our results 
concern fundamental properties of linear subspaces and so may be of independent mathematical interest. 
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I. Introduction 

Compressive sensing is an area in signal processing which has attracted a lot of attention 
recently [Can06] [Don06a]. The motivation behind compressive sensing is to do "sampling" and 
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"compression" at the same time. In conventional wisdom, in order to fully recover a signal, one 
has to sample the signal at a sampling rate equal or greater to the Nyquist sampling rate. The 
process of "sampling at full rate" and then "throwing away in compression" can prove to be 
wasteful of sensing and sampling resources, especially in application scenarios where resources 
like sensors, energy, observation time, etc. are limited. However, in many applications such as 
imaging, sensor networks, astronomy, biological systems [RIC], the signals of interest are often 
"sparse" over a certain basis. In these cases, compressive sensing promises to use a much smaller 
number of samples or measurements while still being able to recover the original sparse signal 
exactly or approximately. What enables practical compressive sensing is the existence of efficient 
decoding algorithms to recover the sparse signals from the "compressed" measurements. One of 
the most import and powerful decoding algorithms is the Basis Pursuit method, namely the ii 
minimization method [CM73], [Don06b]. 

In this paper, we are interested in analyzing the decoding performance of the ii minimization 
algorithm for approximately sparse signals under possibly noisy measurements. Mathematically, 
in compressive sensing problems, we would like to find an n x 1 vector x such that 

y = ^x, (1) 

where A is an m x n measurement matrix, y is an m x 1 measurement vector and m < n 
in general. In the usual compressive sensing context x is an n x 1 unknown /c-sparse vector, 
which has only k nonzero components. In this paper we will consider a more general version 
of the /c-sparse vector x. Namely, we will assume that k components of the vector x have 
large magnitudes and that the vector comprised of the remaining (n — k) components has an 
£i-norm less than some value, say, A. We will refer to this type of signal as an approximately 
A;-sparse signal, or for brevity only an approximately sparse signal. It is also possible that the 
y can be further corrupted with measurement noise. This problem setup is more realistic of 
practical applications than the standard compressive sensing of ideally /c-sparse signals (see, 
e.g., [TWD+06], [Can06], [CRT06] and the references therein). The interested readers can find 
more on similar type of problems in [CDD08] and other references. 

In the rest of the paper we will further assume that the number of the measurements is 
m = 5n and the number of the "large" components of x is A; = p5n = (n, where < p < 1 
and < 5 < I are constants independent of n (clearly, 5 > Q. 
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A. ii Minimization for Ideally Sparse Signal 

ii minimization optimization (Basis Pursuit) proposes solving the following problem 

min ||x||i 
subject to y = Ax, (2) 

where ||x||i denotes the ii norm of x, namely the sum of the amplitudes of all the elements in 

X. 

In [CT05] the authors were able to show that if the number of the measurements is m = 5n 
and if the matrix A satisfies a special property called the restricted isometry property (RIP), 
then any unknown vector x with no more than k = (n (where ( is an absolute constant as a 
function of 6, but independent of n, and explicitly bounded in [CT05]) nonzero elements can 
be recovered by solving (2). As expected, this assumes that y was in fact generated by such an 
X and given to us (more on the case when the available measurements are noisy versions of y 
can be found in e.g. [HN], [Wai06]). 

As can be immediately seen, the previous results heavily rely on the assumption that the 
measurement matrix A satisfies the RIP condition. It turns out that for several specific classes of 
matrices, such as matrices with independent zero-mean Gaussian entries or independent Bernoulli 
entries, the RIP holds with overwhelming probability [CT05], [BDDW08], [RV]. However, it 
should be noted that the RIP condition is only a sufficient condition for £i -optimization to 
produce a solution of (1). 

Instead of characterizing the mxn matrix A through the RIP condition, in [Don06b], [DT05], 
the authors proposed to study A through a /^-neighborly polytope condition. As shown in 
[Don06b], this characterization of the matrix A is in fact a necessary and sufficient condition for 
(2) to produce the sparse solution x satisfying (1). Furthermore, developing the results of [VS92], 
it can be shown that if the matrix A has i.i.d. zero-mean Gaussian entries, then the /^-neighborly 
polytope condition holds with overwhelming probability. The precise relation between m, n and 
k in order for this to happen is characterized in [Don06b]. It should also be noted that for a 
given value m, i.e. for a given value of the constant 5, the value of the constant ( given by 
the neighborly polytope condition is significantly better in [Don06b], [DT05] than in [CT05]. 
In fact, the values of ( for the so-called "weak" threshold, obtained for different values of 5 in 
[Don06b], approach the ones obtained by simulation as n — )> oo. 
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B. ii Minimization for Approximately Sparse Signal 

As mentioned earlier, in this paper we will be interested in recovering not perfectly /c-sparse 
signals from compressed observations y. In this case an exact recovery of the unknown vector x 
from a reduced number of measurements is not possible in general. Instead, we will prove that, if 
we denote the unknown signal as x, denote x as one solution to (2), then for any given constant 
< 6 < I and any given constant C > 1 (representing how close in ii norm the recovered 
vector X should be to x), there exists a constant C > and a sequence of measurement matrices 
A G W^'"- as n -> oo such that 

2(C+1)A 
l|x-x||i< '^^_l , (3) 

holds for all x G M", where A is the £i norm of any (n — k) elements of the vector x (recall 
k = (n). Here ( will be a function of C and 5, but independent of the problem dimension n. 
In particular, we have the following theorem. 

Theorem 1: Let n, m, k, x, x and A be defined as above. Let K denote a subset of {1, 2, . . . , n} 
such that \K\ = k, where Ifi'l is the cardinality of K, and let Ki denote the i-th element of K 
mdK = {l,2,...,n}\K. 

Then for any constant C > 1 and any 5 = — > 0, there exists a ((5,C) > such that if the 
measurement matrix A is the basis for a uniformly-distributed subspace, then with overwhelming 
probability as n — )> oo, for all vectors w G M" in the null space of A, and for all K such that 
\K\ = k < ({5, C)n, we have 

k n—k 

C5^|w;,J<^|w^J, (4) 

where xk denotes the part of x over the subset K; and at the same time the solution x produced 
by (2) will satisfy 

2(C+1)A 
l|x-x|K<^^-^, (5) 

for all X G M". 

The main focus and contribution of this paper is to establish a sharp relationship between 5, 
( and C. For example, when 5 = — varies, we have Figure 1 showing the tradeoff between the 
signal sparsity ( and the parameter C, which determines the robustness ^ of the £i minimization. 

The "robustness" concept in this sense is often called the "stability" in other papers, for example, [Can06]. 
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The curve for C = 1 matches the "strong" threshold curve from [Don06b] for ideally sparse 
signal vectors . 
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Fig. 1: Tradeoff between signal sparsity and t\ recovery robustness as a function of C (allowable 



imperfection of the recovered signal is 
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To obtain the stated results we will make use of a characterization that constitutes both 
necessary and sufficient conditions on the matrix A such that the solution of (2) approximates 
the original signal accurately enough such that (3) holds. This characterization will be equivalent 
to the neighborly polytope characterization from [Don06b] in the "ideally sparse" case when 
C = 1. Furthermore, as we will see later in the paper, in the perfectly sparse signal case (which 
allows C — > 1), our result for allowable (^ matches the result of [Don06b]. Our analysis will 
be directly based on the null space Grassmann angle approach in high dimensional integral 
geometry, which gives a unified analytical framework for ii minimization. 
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A similar problem was considered in [CDD08], where the null space characterization for 
recovering approximately sparse signal was analyzed using the RIP in [CT05]; however, no 
explicit values of ( were given. Since the RIP condition is a sufficient condition for good 
sparse signal recoveries using ii minimization, it generally gives rather loose bounds on the 
explicit values of ( even in the ideally sparse signal case [CT05][BCT09]. There have also 
been some recent works trying to analyze the performance of £i minimization through non- 
RIP techniques [Zha08], [Vav09], [Sto09]. Compared with previous results, in this paper we 
will provide sharp bounds on the explicit values of the allowable constants (^ for satisfying the 
subspace "balancedness" condition, as a function of C > 1. In the literature, there are also 
discussions of compressive sensing under different definitions of non-ideally sparse signals, for 
example, [Don06a] discusses compressive sensing for signals from a ip ball with < p < I using 
sufficient conditions based on results of the Gelfand n-widths. However, the results of this paper 
are dealing directly with approximately sparse signals defined in terms of the concentration of ii 
norm, and furthermore, we give a neat necessary and sufficient condition for £i optimization to 
be robust and we are also able to explicitly give much sharper bounds on the sparsity parameter 
(. When we finalize this draft from our earlier conference publication [XH08], we are informed 
of the very recent work [DMMIO] which deals with a related but different problem formulation 
of characterizing the tradeoff between signal sparsity and noise sensitivity of LASSO recovery 
method. Compared with [DMMIO], we are dealing with the plain ii minimization method for 
recovering approximately sparse signals, and the performance bounds in this paper apply to 
general type of signals and noises. The analysis from [DMMIO] is an average-case analysis 
for compressed measurements corrupted with Gaussian noises, while the analysis in this paper 
provides both average-case and worst-case performance bounds under general types of signals 
and noises. It is also noteworthy pointing out that this work considers the plain ii minimization, 
which does not require the decoder to know of the statistical variance of the measurement noises. 
The analysis methodologies between this work and [DMMIO] are also different: this work relies 
on the analytical tools from the high dimensional poly tope geometry, while [DMMIO] builds on 
the innovations of analyzing message passing algorithms. 

The rest of the paper is organized as follows. In Section II, we introduce a null space 
characterization of linear subspaces for guaranteeing robust signal recovery using the ii mini- 
mization. Section III presents a Grassmann angle-based high dimensional geometrical framework 
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for analyzing the null space characterization. In Sections IV, VI, and VII, analytical performance 
bounds are given for the null space characterization. Section VIII shows how the Grassmann 
angle analytical framework can be extended to analyzing the "weak", "sectional" and "strong" 
notations of robust signal recovery. In Section IX, we present the robustness analysis of the 
ii minimization under noisy measurements. In Section X, the numerical evaluations of the 
performance bounds for robust signal recovery are given. Section XI concludes the paper. In the 
appendix, we provide a quick summary of relevant geometric concepts in the high dimensional 
geometry and the proofs of related lemmas and theorems. 

II. The Null Space characterization 

In this section we introduce a useful characterization of the matrix A. The characterization 
will establish a necessary and sufficient condition on the matrix A so that the solution of (2) 
approximates the solution of (1) such that (3) holds. (See [FN03], [LN06], [Zha06], [CDD08], 
[SXH08a], [SXH08b], [KT07] etc. for variations of this result). 

Theorem 2: Assume that A is a general mx n measurement matrix. Let C > 1 be a positive 

number. Further, assume that y = Ax and that w is an n x 1 vector. Let K he a subset of 

{1,2, . . . ,n} such that \K\ = k, where \K\ is the cardinality of K and let Ki denote the i-th 

element of K. Further, let K = {1,2, . . . ,n} \ K. Then for any x G W\ for any K such that 

li^l = k, any solution x produced by (2) will satisfy 

2(C + 1)„ „ 
||x - x||i < ^^ W^kWi, (6) 

if Vw G M" such that 

Aw = 

and \/K such that \K\ = k, we have 

k n—k 

CJ2\^K.\<Y.\^K.\- (7) 

i=l 1=1 

Conversely, there exists some measurement matrix A, a set K with cardinality k, an x, and 
corresponding x (x is a minimizer to the programming (2)), such that (7) is satisfied with equality 
for some vector w in the null space of A with a constant C > I; moreover 

II Nl 9l£±llll II 

X — X 1 =2 Xtj^ 1. 

II 111 ^,_^ II XIII 
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and 

l|x-x||i>2^^^— ^llx^^lli. 

for any C bigger than the constant C. 

Proof: First, suppose the matrix A has the claimed null space property as in (7) and we 
want to prove that any solution x satisfies (6). Note that the solution x of (2) satisfies 

l|x||i < ||x||i, 

where x is the original signal. Since Aic = y, it easily follows that w = x — x is in the null 
space of A. Therefore we can further write ||x||i > ||x + w||i. Using the triangular inequality 
for the ii norm we obtain 

||xx||i + llx^^lli = ||x||i 

> ||x||i = ||x + w||i 

> llxi^lli - llwi^lli + llw^^lli - ||x7^||i 

C-l„ „ 

> ||x;^||i- ||x:^||i + ^^^-^||w||i 

where the last inequality is from the claimed null space property. Relating the head and tail of 
the inequality chain above, 

2iix^iK > Vrriiwik. 

Now we prove the second part of the theorem, namely when (7) is violated, there exist 
scenarios where the error performance bound (6) fails. The simplest example is when the null 
space of the measurement matrix A is a one-dimensional subspace and has an all-1 vector 
(1, 1, ..., 1) as its basis. Let n be an even number. For any A; < |, let us take C = ^^^ and 
C = ^^ + e, where e > is an arbitrarily small positive number. Then obviously there exists 
a vector w in the null space of A that violates the condition (7) for C = ^^ + e for the set 
K = {1,2, ...,k}. Now we consider a signal vector 



1,0,0,..., 0). 
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Taking the null space of A into account, we can see 

x= (0,0,.. .,0,1,1,...,!) 

is a minimizer to the programming (2). 

Note that ||x;j^||i = | — /c and ||x — x||i = n, 

||x — x||i 



xdi 




n 2(^ 


+ 1) 


2-'^ k 


- 1 


2{C' + 1) 




{C - 1) 




2(C + 1) 





^ C-1 ' 

strictly contradicting the error bound (6). ■ 

It should be noted that if the condition (7) is true for all the sets K of cardinality k, then 

2||x:^||i > ^^^ l|x-x||i 

is also true for the set K which corresponds to the k largest (in amplitude) components of the 
vector X. So 

2A><^^||x-x|K 

- ^^^ II 111 

which exactly corresponds to (3). In fact, the condition (7) is also a sufficient and necessary 

condition for unique exact recovery of ideally /c-sparse signals after we take C = \ and let (7) 

take strict inequality for all w 7^ in the null space of A. To see this, suppose the ideally k- 

sparse signal x is supported over the set K, namely, ||x-^||i = 0. Then from the same triangular 

inequality derivation of Theorem 2, we know that ||x — x||i = 0, namely x = x. Or we can just 

let C be arbitrarily close to 1 from the right and since 

2(C + 1)„ „ 
||x - x||i < —^t—^W^kWi = 0, 

we also get x = x. In this sense, when C = 1, the null space condition is equivalent to the 
neighborly poly tope condition [Don06b] for unique exact recovery of ideally sparse signals. 

However, it is an interesting result that, for a particular fixed measurement matrix A, the 
violation of (7) for some C > 1 does not necessarily mean that the existence of a vector x and 
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a minimizer solution x to (2) such that the performance guarantee (6) is violated. For example, 
assume n = 2 and the null space of the measurement matrix A is a one-dimensional subspace 
and has the vector (1, 100) as its basis. Then the null space of the matrix A violates (7) with 
C = 101 and the set K = {1}. But a careful examination shows that the biggest possible ^f'^| 
(||x7^||i 7^ 0) is equal to ^^j^^ = j^, achieved by such an x as (—1,-1). In fact, all those 
vectors x = (a,b) with 6^0 will achieve ^^^ = ^. However, (6) has ^^^f^ = f^. This 

^ ' ' ' l|x-jf||i 100 ' ^ ^ C—l 100 

suggests that for a specific measurement matrix A, the tightest error bound for ^j^'^i should 
involve the detailed structure of the null space of A. But for general measurement matrices A, 
as suggested by Theorem 2, the condition (7) is a necessary and sufficient condition to offer the 
performance guarantee (6). 

It is worth pointing out that the example given in the proof of Theorem 2 is not just an isolated 
example. In fact, for two general positive integers m and n with m < n and n > 2, we can often 
find an m X n measurement matrix A and a certain C > I such that the condition (7) is violated 
and, at the same time, for some vector x, the performance bound is also "tightly" violated. 

Consider a generic mxn matrix A'. For each integer 1 < k < n, let us define the quantity h^ 
as the supermum of , -^ , over all such sets K of size \K\ < k and over all nonzero vectors w 
in the null space of A'. Let k* be the biggest k such that hk < 1. Then there must be a nonzero 
vector w' in the null space of A and a set K* of size k*, such that 

Now we generate a new measurement matrix A by multiplying the portion A'^.of the matrix A' 
by hk* . Then we will have a vector w in the null space of A satisfying 

||wk*||i = llw^f^lli. 

Now we take a signal vector x = {—wk*, 0-^) and claim that x = (0, w-^) is a minimizer 
to the programming (2). In fact, recognizing the definition of hk*, we know all the vectors W 
in the null space of the measurement matrix A will satisfy ||x + w"\\i > ||x||i. Let us assume 
that k* > 2 and take K C K* as the index set corresponding to the largest (k* — i) elements 
of y^K* in amplitude , where 1 < i < (/c* — 1). From the definition of k*, it is apparent that 

||w — jj-\\l ||w — ^lli 

C = II -^ II > 1 since w is nonzero for any index in the set K*. Let us now take C = „ ^ „ +e, 
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where e > is any arbitrarily small positive number. Thus the condition (7) is violated for the 

vector w, the set K and the defined constant C. 

Now by inspection, the decoding error is 

II -^11 2(C"+1),, ,, 2(C + 1),, ,, 
l|x-x||i = ^, _^ II x^" 111 > ^_-^ l|xx"lli> 

violating the error bound (6) (for the set K"). 

In the remaining part of this paper, for a given value 5 = — and any value C > 1, we will 
devote our efforts to determining the value of feasible ( = p6 = - for which there exists a 
sequence of A such that the null space condition (7) is satisfied for all the sets K of size k when 
n goes to infinity and — = 5. For a specific A, it is very hard to check whether the condition 
(7) is satisfied or not. Instead, we consider randomly choosing A from a Gaussian distribution, 
and analyze for what (, the condition (7) for its null space is satisfied with overwhelming 
probability as n goes to infinity. When we consider C = I, corresponding to the success of 
ii minimization for all ideally /c-sparse signals, loose bounds on (^ achieving the null space 
condition were established in [CT05][Zha06][SXH08a] using the restricted isometry property and 
high dimensional geometrical results. The null space condition is equivalent to the /^-neighborly 
polytope condition when C = 1, so the neighborly polytope condition [Don06b] gives much 
sharper bounds for the null space condition when C = I. However, no sharp bounds are available 
for the null space condition with the general case C > I. 

The standard results on compressive sensing assume that the matrix A has i.i.d. J\f{0, 1) entries. 
The following lemma gives a characterization of the resulting null space of A, which is a fairly 
well known result, and for the sake of completeness, we include its proof in the appendix. 

Lemma 1: Let A E M™^" be a random matrix with i.i.d. M{0, 1) entries. Then the following 
statements hold: 

• The distribution of A is right-rotationally invariant: for any 6 satisfying B0* = 0*0 = /, 

PAiA) = PAiAey, 

• There exists a basis Z of the null space of A, such that the distribution of Z is left- 
rotationally invariant: for any satisfying 00* = 0*0 = /, PziZ) = Pz{Q*Z)\ 

• It is always possible to choose a basis Z for the null space such that Z has i.i.d. A/'(0, 1) 
entries. 



May 21, 2010 DRAFT 



12 

In view of Theorem 1 and Lemma 1 what matters is that the null space of A be rotationally 
invariantly. Sampling from this rotationally invariant distribution is equivalent to uniformly 
sampling a random (n — m) -dimensional subspace from the Grassmann manifold Gr(„_j„)(n). 
Here the Grassmann manifold Gr(„_m)(n) is the set of (n — m) -dimensional subspaces in the 
n-dimensional Euclidean space M" [B0086]. For any such A and ideally sparse signals, the 
sharp bounds of [Don06b], apply. However, we shall see that the neighborly polytope condition 
for ideally sparse signals does not readily apply to the proposed null space condition analysis 
for approximately sparse signals, since the null space condition can not be transformed to the 
/c-neighborly property in a single high-dimensional polytope [Don06b]. Instead, in this paper, 
we shall give a unified Grassmann angle framework to analyze the proposed null space property. 

III. The Grassmann Angle Framework for the Null Space Characterization 

In this section we detail the Grassmann angle-based framework for analyzing the bounds on 
C = - such that (7) holds for every vector in the null space, which we denote by Z. Put more 
precisely, given a certain constant C > 1 (or C > 1), which corresponds to a certain level of 
recovery accuracy for the approximately sparse signals, we are interested in what scaling - we 
can achieve while satisfying the following condition on Z {\K\ = k): 

yw e Z,\fK C {1,2, ...,n},C\\wK\\i < llw^^lli. (8) 

From the definition of the condition (8), there is a tradeoff between the largest sparsity level k 
and the parameter C. As C grows, clearly the largest k satisfying (8) will likely decrease, and, 
at the same time, ii minimization will be more robust in terms of the the residual norm Hx^^Hi. 
The key in our derivation is the following lemma: 

Lemma 2: For a certain subset K C {l,2,...,n} with \K\ = k, the event that the null space 
Z satisfies 

C||Wi^||l < ||w;j^||i,Vw G Z 

is equivalent to the event that Vx supported on the fc-set K (or supported on a subset of K): 

||xx + w,^||i + ||-^||i> ||x,^||i,VwgZ. (9) 
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Proof: First, let us assume that C||w;i-|| i < || w;j^|| i, Vw G Z. Using the triangular inequality, 
we obtain 



w 



K I 



||xx + wxlli + iiYTlli 

> ||xx||i - llwi^lli + || — 111 

> ||xx||i- 

thus proving the forward part of this lemma. Now let us assume instead that 3w G Z, such that 
C||wx||i > ||w7^||i. Then we can construct a vector x supported on the set K (or a subset of 
K), with yiK = —^K- Then we have 

II II i|Wt-|| 

||Xi^ + W/^||l + || — 111 

- 0+ II — 111 

< ||xk||i, 

proving the inverse part of this lemma. ■ 

Now let us consider the probability that condition (8) holds for the sparsity \K\ = A; if we 
uniformly sample a random (n — m) -dimensional subspace Z from the Grassmann manifold 
Gr(„_m)(^)- Based on Lemma 2, we can equivalently consider the complementary probability 
P that there exists a subset K C {1, 2, ..., n} with \K\ = k, and a vector x G M" supported on 
the set K (or a subset of K) failing the condition (9). With the linearity of the subspace Z in 
mind, to obtain P, we can restrict our attention to those vectors x from the cross-polytope (the 
unit ii ball) 

{xGM" I ||x||i = 1} 

that are only supported on the set K (or a subset of K). 

First, we upper bound the probability P by a union bound over all the possible support sets 
K C {1,2, ...,n} and all the sign patterns of the /c-sparse vector x. Since the A;-sparse vector 
X has (^) possible support sets of cardinality k and 2'^ possible sign patterns (nonnegative or 
nonpositive), we have 

P < Q X 2^^ X Pk,-, (10) 
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Fig. 2: The Grassmann Angle for a Skewed Cross-polytope 



where Pr- is the probability that for a specific support set K, there exist a /c-sparse vector x 
of a specific sign pattern which fails the condition (9). By symmetry, without loss of generality, 
we assume the signs of the elements of x to be nonpositive. 

So now let us focus on deriving the probability Pk-- Since x is a nonpositive fc-sparse 
vector supported on the set K (or a subset of K) and can be restricted to the cross-polytope 
{x G M" I ||x||i = 1}, X is also on a (A; — 1) -dimensional face, denoted by F, of the skewed 
cross-polytope (weighted £i ball) SP: 



SP = {y G 



|yi.||i + ||^||i<i} 



(11) 



Then Pk- is the probability that there exists an x G F, and there exists a w G Z (w 7^ 0) 
such that 



XK + Wi^ 1 + 



w 



K I 



C 



< ll^i^l 



1. 



(12) 
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We first focus on studying a specific single point x G F, witliout loss of generality, assumed 
to be in the relative interior of this (/c — 1) dimensional face F. For this single particular x on 
the F, the probability, denoted by P^, that 3w E Z (w ^ 0) such that 12 holds is essentially 
the probability that a uniformly chosen {n — m) dimensional subspace Z shifted by the point x, 
namely (Z + x), intersects the skewed cross-polytope 

SP = {yGM" I ||yi^||i + ||^||i<l} (13) 

nontrivially, namely, at some other point besides x. 

From the linear property of the subspace Z, the event that (Z + x) intersects the skewed cross- 
polytope SP is equivalent to the event that Z intersects nontrivially with the cone SP-Cone(x) 
obtained by observing the skewed polytope SP from the point x. (Namely, SP-Cone(x) is conic 
hull of the point set (SP — x) and SP-Cone(x) has the origin of the coordinate system as its apex.) 
However, as noticed in the geometry for convex polytopes [Gru68][Gru03], the SP-Cone(x) are 
identical for any x lying in the relative interior of the face F. This means that the probability 
Pk- is equal to P^, regardless of the fact x is only a single point in the relative interior of the 
face F. (The acute reader may have noticed some singularities here because x E F may not be 
in the relative interior of F, but it turns out that the SP-Cone(x) is then only a subset of the 
cone we get when x is in the relative interior of F. So we do not lose anything if we restrict x 
to be in the relative interior of the face F.) In summary, we have 

Pk,- = PL- 

Now we only need to determine P^. From its definition, P^ is exactly the complementary 
Grassmann angle [Gru68] for the face F with respect to the polytope SP under the Grassmann 
manifold Gr(„_„()(n):^ the probability of a uniformly distributed (n— m)-dimensional subspace Z 
from the Grassmannian manifold Gr(„_m) {n) intersecting nontrivially with the cone SP-Cone(x) 
formed by observing the skewed cross-polytope SP from the relative interior point x E F. 

Building on the works by L.A.Santalo [San52] and PMcMuUen [McM75] etc. in high di- 
mensional integral geometry and convex polytopes, the complementary Grassmann angle for the 

'a Grassman angle and its corresponding complementary Grassmann angle always sum up to 1. There is apparently 
inconsistency in terms of the definition of which is "Grassmann angle" and which is "complementary Grassmann angle" between 
[Grii68],[AS92] and [VS92] etc. But we will stick to the earliest definition in [Grii68] for Grassmann angle: the measure of the 
subspaces that intersect trivially with a cone. 
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(k — 1) -dimensional face F can be explicitly expressed as the sum of products of internal angles 
and external angles [GriiOB]: 

2xE E PiF,GhiG,SF), (14) 

where s is any nonnegative integer, G is any (m + 1 + 2s) -dimensional face of the skewed 
cross-polytope (Q=„j+i+2s(SP) is the set of all such faces), /3(-, ■) stands for the internal angle 
and 7(-, ■) stands for the external angle. 

The internal angles and external angles are basically defined as follows [Gru03][McM75]: 

• An internal angle (3{Fi, F2) is the fraction of the hypersphere S covered by the cone obtained 
by observing the face F2 from the face Fi. ^ The internal angle /3(-Fi, F2) is defined to be 
zero when Fi ^ F2 and is defined to be one if Fi = F2. 

• An external angle 7(^3,^4) is the fraction of the hypersphere S covered by the cone of 
outward normals to the hyperplanes supporting the face F4 at the face F3. The external 
angle 7(^3, F4) is defined to be zero when F3 ^ F4 and is defined to be one if F3 = F4. 

Let us take for example the 2-dimensional skewed cross-polytope 

SP = {(1/1,1/2) gm'I ||y2||i + ||^||i<i} 

(namely the diamond) in Figure 2, where n=2, [n ~ m) = 1 and k = 1. Then the point 
X = (0, —1) is a 0-dimensional face (namely a vertex) of the skewed polytope SP. Now from their 
definitions, the internal angle /3(x, SP) = £: and the external angle 7(x, SP) = ■^, 7(SP, SP) = 1. 
The complementary Grassmann angle for the vertex x with respect to the polytope SP is the 
probability that a uniformly sampled 1 -dimensional subspace (namely a line, we denote it by 
Z) shifted by x intersects nontrivially with SP = {{y 1,1/2) G M^l ||y2||i + |||t||i < 1} (or 
equivalently the probability that Z intersects nontrivially with the cone obtained by observing 
SP from the point x). It is obvious that this probability is -. The readers can also verify the 
correctness of the formula (14) very easily for this toy example. 

Generally, it might be hard to give explicit formulae for the external and internal angles 
involved, but fortunately in the skewed cross-polytope case, both the internal angles and the 
external angles can be explicitly computed. 

"'Note the dimension of the hypersphere S here matches the dimension of the corresponding cone discussed. Also, the center 
of the hypersphere is the apex of the corresponding cone. All these defaults also apply to the definition of the external angles. 
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Firstly, let us look at the internal angle (3{F,G) between the {k — 1) -dimensional face F 
and a (/ — 1) -dimensional face G. Notice that the only interesting case is when F C G since 
(3{F, G) ^ only if F C G. We will see if F C G, the cone formed by observing G from F is 
the direct sum of a (A; — 1) -dimensional linear subspace and a convex polyhedral cone formed 
by (/ — k) unit vectors with inner product j— ^^ between each other. In this case, the internal 
angle is given by 

where Vi(S'*) denotes the i-th dimensional surface measure on the unit sphere S\ while Vi{a',i) 
denotes the surface measure for regular spherical simplex with (i + 1) vertices on the unit 
sphere S** and with inner product as a' between these (i + 1) vertices. Thus (15) is equal to 

B{a', m') = e"^ V(m' - l)a' + In-'"'' /^a'-^'\j(m\ 9), (16) 

with 9 = (1- a') /a' and 

1 /'OO /'OO 

J{m',e) = ^ (/ e-''''^'''''Uvr'e-^'d\. (17) 



^'^ J-oo JO 

We should remark that the formula above for the internal angle is true only when the face G is 
not of dimension n. When G is n-dimensional, we will derive a separate formula in Lemma 15. 
Since the expression for this special case will not affect our following derivations in a significant 
way, we choose not to list it here. 

Secondly, we can derive the external angle 7(6", SP) between the (/ — 1) -dimensional face G 
and the skewed cross-polytope SP as: 



l{G, SP) = — ^^3iTT / e-^ ( / ^r^l^ e-y dyy-^ dx. (18) 

^/'K Jo Jo 

The derivations of these expressions involve the computations of the volumes of cones in high 
dimensional geometry and will be presented in the appendix. 

In summary, combining (10), (14), (15) and (18), we get an upper bound on the probability 
P. If we can show that for a certain ( = -, P goes to zero exponentially in n as n — )> 00, then 
we know that for such (, the null space condition (8) holds with overwhelming probability. This 
is the guideline for computing the bound on ( in the following sections. 
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IV. Evaluating the Bound ( 
In summary, 

P<(j)x2'x2xJ2 E /^(^' ^)^(^' SP)- (19) 

In order for this upper bound on P to decrease to as n — )■ oo, one sufficient condition is 
that every sum term in (19) goes to exponentially fast in n. We remark that the equation in 
(19) is similar to the expected number of missed "faces" in the study of /^-neighborly polytope 
[Don06b], [VS92], but generalizes the /^-neighborly polytope formula to more general Grassmann 
angles. In the following sections, we will extend the techniques developed in [Don06b], [VS92] 
to evaluating the bounds on ( from (19), taking into account of the variable C > 1. To illustrate 
the effect of C on the bound (, also for the sake of completeness, we will keep the detailed 
derivations. 

For simplicity of analysis, we define / = (m + 1 + 2s) + 1 and z/ = -. In the skewed cross- 
polytope SP, we notice that there are in total ("~^)2'^^ faces G of dimension (/ — 1) such that 
F C G and (3{F, G) ^ 0. Because of the symmetry in the skewed cross -polytope SP, it follows 
from (19) that 

P < E202' X (^^Zk) /3(^,G)7(G,SP), (20) 



s>0 



COMs 



where / = (m + 1 + 2s) + 1 and G C SP is any single face of dimension (/ — 1) such that 
FCG. 

Closely following the approach of [Don06b], in estimating n~^log(Ds), we can decompose 
it into a sum of terms involving logarithms of the combinatorial factor, the internal angle and 
the external angle. With 

H{p)=p\og{l/p) + (1 -p)log(l/(l -p)), 

where the logarithm base is over e. From Stirling's formula, we know that 

n-Mogf "^ j ^H{p),pe [0,l],n^cx). (21) 
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Defining u = l/n > 6, we liave 

n-i logiCOMs) = z/log(2) + H{p5) + iJ(^^)(l - p5) + R^ (22) 

1 — po 

with remainder Ri = Ri{s,k,m,n). 

Define the combinatorial growth exponent for CO Ms 

i^coU^; P,5) = u log(2) + H{p5) + iJ(^^)(l - p5), (23) 

describing the exponential growth of the combinatorial factors. Applying (21), we will see that 
the remainder Ri in (22) is o(l) uniformly in the range / > 5n, n > no{p, 5, e), where no(p, 5, e) 
is some big enough natural number. 

For a particular C, we will also define a decay exponent -ipextii^', P, S) and show that 7(6", SP) 
decays exponentially at least at the rate ^ezt(^; P, S)'- for each e > 0, 

n-Mog(7(G,SP))<-^extM + e, 

uniformly in / > 5n, n > nQ(p,5,e). When it is clear in the context what C is, we will often 
omit C in the notations. 

Similarly, under the parameter C, Section VII below shows that the decay exponent for the 
internal angle (3(F, G) is ipint{i^] p, 5), which is defined in Section VII. Since k ~ p6n, I ~ un, 
we will have the scaling 

n-i log(/3(F, G)) = -^,nt{iy; P, S) + i?2, 

where the remainder R2 = o(l) uniformly in I > 5n when n > no{p,5,e) is a large enough 
natural number. 

In summary, under a given C > 1, for any fixed choice of p, 5, for e > 0, and for n > 

noip,5,e), 

rT^ \og{Ds) < i)com{i'] P, S) - i)int{i^] P, S) - ^ext(z^; P, S) + 3e, (24) 

holds uniformly over the sum parameter s in (14). 

In the rest of this paper, when the parameters p, 5 and C are clear from the context, we will 
omit them from the notations for the combinatorial, internal and external exponents. 
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A. Characterizing Pn{S, C) 

Continuing to follow [Don06b], we define the net exponent ipnet = ^comii^', P, 5)~'^int{i^] P, 5) — 
^extit^', P,S). We will know that the components of ^net are all continuous over sets p E 
[po, l],5 E [5o, 1], z^ G [5, 1], and 4'net is also continuous over these regions. 

Definition 1: Let 6 E (0, 1]. The critical proportion pj^(5,C) is the supremum of p G [0, 1] 
satisfying 

llJnet{l^;P,S) < 0, U E [6,1]. 

Continuity of ^net shows that if p < piy then, for some e > 0, 

iJnet{l^]P,S) < -e, U E [6,1]. 

Combine this with (24), for all s = 0,2, . . . ,{n — m)/2 and all n > uq^S, p, e), 

n'^ \og{Ds) < -e. 

Note that if this negative exponent condition holds, we will have the results in Theorem 2. 

In the next section, we will specify the exponents ipint and ^ext for the internal angles and 
external angles respectively, and we will discuss properties of pj^{5,C). 

V. Characterizations of Angle Exponents 

A. Exponent for External Angle 

Let G denote the cumulative distribution function of a half-normal HN{0, 1/2) random 
variable, i.e. a random variable X = \Z\ where Z ~ iV(0, 1/2), and G{x) = Prob{X < x}, 
where the density function g(x) = 2/y^exp(— x^) and thus G(x) is the error function 

G(a;) = 4= re-y'dy. (25) 

For u E (0, 1], define x^, as the solution of 

2^G{x) 1 - V 

— r^— = — —^ (26) 

g[x) v' 

where 

v' = {C^ - l)pS + u. 

Because xG{x) is a smooth strictly increasing function, which goes to as a; — )> and behaves 
close to a; as a; — )■ oo, and because g{x) is strictly decreasing, the function 2xG{x)/g(x) is a 
strictly increasing function. So x,, is a well-defined, smooth, and decreasing function of u. 
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We have x^ — )■ as z/ — )■ 1 and Xi, ~ A/log((l — v)/v') as z/ — ;> 0. Define now 

i^extiv) = -(1 - i^) log(G'(a;^)) + uxl- 

This function is smooth on the interior of (0, 1), with endpoints ipexti^) = 0, ^ext(O) = 0. When 
C = 1, we have the asymptotic [Don06b] 

^Pextiiy) ~ ^^log(-) - -z/log(log(-)) + o{iy), u^O. (27) 

B. Exponent for Internal Angle 

Closely following [Don06b], take F as a standard half-normal random variable HN{0, 1). 
From standard calculations, we know that its cumulant generating function A(s) = log(_E(exp(sF)) 
is given by 

A(s) = ^ + log(2<|.(s)), 

where $ is the usual cumulative distribution function of a standard Normal A^(0, 1). So the large 
deviation rate function of the cumulant generating function A* is defined as 

A*{y) = max sy — A{y). 

s 

From the large deviation theory, this function is smooth and convex on (0, oo), strictly positive 



except being equal to at /i = E{Y) = y 2/7r. For 7' G (0, 1) let 

l-7'_.2 



e7'(l/) = ^7^r/2 + A*(l/), (28) 

7 



where we define 

p5 



7 



C2-1 



The function ^yi{y) is strictly convex and positive on (0, 00) and has a unique minimum at a 



unique yy in the interval (0, ^^jix). Then we have the internal angle exponent as 

i\)int{v\ p, 5) = ^y (i/y)(z/ - pS) + log(2)(z/ - pS). (29) 

For fixed p, 5, Aj„i is continuous m v > b. Most importantly, in the section below, we get the 
asymptotic formula 



1, ,1-7'. 



ey(t/y)~-log(— j^),7'^0 (30) 



Because 7' = -^ttzt IT' (30) means for small p, v E [5, 1] and any given r^ > 

i^^nt{y. p5) > (i ■ log(i^)(l -v) + log(2))(z/ - p5). (31) 

z 7 
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C. Properties of pn{6,C) 

We now consider the combined behavior of ^com, i'int and V'net- We think of these as functions 
of V with p, 5 as parameters. il)com is the exponent of a growing function which must be 
outweighed by the sum of the other two exponents: ^j„j + V^„et. 

The asymptotic relations (27) and (30) allow us to see the following key facts about pn{5, C), 
the proofs of which are given in the appendix. 

Lemma 3: For any 6 > and any C > 1, we have 

Pm{S,C)> 0,6 e (0,1). (32) 

Generalizing the result in [Don06b] for pAr(5, 1), one can show the asymptotic of pAr(5, C) — )■ 
as (5 -> 0. 

Lemma 4: For all 77 > and any C > 1, 

PNi6,C)>\ogi^)-^'+'^\ 6^0. (33) 

Finally, we have the lower and upper bounds for p]^(5,C), which shows the scaling bounds 
for Pat (5, C) as a function of C. 

Lemma 5: When C > 1, for any fixed 6 > 0, 

^{^)<PNiS,C)<-^, (34) 

where fi(^) < Pn{^,C) means that there exists a constant l{5), 

l{6) 



(J2 

where we can take l{5) = Pn{5, !)• 



<PAr(5, C), as C — )> 00, 



VI. Deriving the External Angle Exponents 

In the previous section, we described how to compute the external and internal angle exponents, 
and we will give the derivations which justify the computations of the two exponents. First, we 
start justifying the computation of ifjext given in Section V. 

Lemma 6: Fix 6, e > 

n-' log(7(G, SP)) < -^net{l/n) + ei, (35) 
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uniformly in / > 6n, when n is large enough. 

Proof: In the appendix, we derived the explicit integral formula for the external angle: 

l{G, SP) = — ^nn / e""'( / "^'^^ e-^' dyy-' dx. (36) 

nr Jo Jo 



After a changing of integral variables, we have 



-,,G.SP)^,/(^i^il^ (37, 



oo 







-^ [ e-y' dyy-^dx. 

VTrJo 



Inside the parenthesis is the error function G from (25). Let v = -, v' = (C — \)p5 + u then 
the integral formula can be written as 



TT Jo 

To look at the asymptotic behavior of (38), following the same methodology as in [Don06b], 
we use Laplace's method. We define 



-n'4>p,5,Ay) ''^^' 



/p,V,n(l/) = e-"'^-*-^^^-y— (39) 

with 

We will develop expressions for the second and third derivatives of the function ipp^s^v 
Applying Laplace's method to i^p^s^u gives the following lemma, where we will defer the proof 
to later parts of the paper. 

Lemma 7: For v E (0, 1), let x,y denote the minimizer of ipp^s,u- Then 

POD 

/ /p,5,.,n(a;) dx < e-"'^''.*.-("-)(i+^" ('')), 
Jo 

where for 5, r^ > 0, 

sup Rn{i^) = o(l) as n — > oo, 

and x^ is exactly the same x^ defined earlier in (26). 
Recall that the defined exponent ipext is given by 

ipextit^; p, 5) = ^p,5A^'^)- (40) 
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Using the definition of il)p^5^u{xy) and (40), it is not hard to see, as z/ — )> 1, a;,^ — )■ and 
i>ext{i^) — ^ 0. For any given ei > in Lemma 6, there is a largest z/^i < 1 with ^pextii^ei) = ^i- 
Note that 'j{G, SP) < 1, so for / > u^^n, 

n~' loglTlG, SP)) < < -^extiiy) + ei, 

for n > 1. Consider now / G [5n, Ve^n], based on (38), 

7(G,SP) = / fp,5,uAy)dx. 
Jo 

From Lemma 7, as n — )> oo, uniformly for / G [5n, v^^n], 

n-Mog(7(G',SP)) = 7A.(x,) + o(l), 

where we have abbreviated il)p^s,u{-) to V^jy(-) for fixed p and 5. 
So from the identity (40), we get 

n-^ log(7(G, SP)) < -^net{l/n) + o(l). (41) 

Then Lemma 6 follows. 

■ 

Now it remains to prove the uniformity result for Laplace's method in Lemma 7. We will 
follow the same line of reasoning given in [Don06b]. First, we state explicitly the key lemma 
from [Don06b]. 

Lemma 8: [Don06b] Let ip(x) be convex in x and belong to the differentiability class C^ 
(the second derivative exists and is continuous) on an interval / and suppose that it takes its 
minimum at an interior point xq G /, where ^^"(xo) > and that in a vicinity (xq — e, xq + e) 
of Xq: 

\r{x)-nxo)\ < D\^"{xo)\\x-xo\. (42) 

Let ^ be the quadratic approximation ^'{xq) + ^"{xo){x — a;o)^/2. Then 

/» /»oo 

/ exp{—n4'{x)) dx < exp (—'mjj(x)) dx ■ (Si^n + S2,n) 

Jl J -oo 

where 

Si^n = exp{nilj"{xo)De^/6) 

S2,n = 2/(^nei27r\rm)Hl-lDe': 
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The constant D in this lemma can be a scaled third derivative, since if il> is C^, we can take 

D= sup i)^^\x)/i)"{x). 

{xo-e,Xf,+e) 

2 

Based on Lemma 8, we can derive the uniformity in Lemma 7. In fact, if we pick e„ = n~5 
and let n > ni(^"(xo), D), where ni{^p"(xo), D) is a number depending only on iI>"{xq) and D, 
we can use 

/ e-"^(^) dx< j e-"'^^^) dx-{l + o(l)) (43) 

Here the term o(l) is uniform over any collection of convex functions with a given ^"{xq) and 
D. From here to the end of this section, we will abbreviate i^p^s^^, as ^^ for the fixed parameter 
p and 5. 

Now we consider the collection of convex functions ipi^ (^ ^ l^A — v]) ^^ Lemma 7. 
Following the derivations in [Don06b], if we can show that there exists a certain e > so that 
^"{xq) and D is bounded for the function i^uix) uniformly over the range u E [6,1 — i]], then 
Lemma 7 holds. Indeed, this is true based on Lemma 9 as given below. 

Lemma 9: The function V^jy(-) is smooth with its second derivative at x^ 

<(x.) = 2z/' + 4xy + ^^ (44) 

and its third derivative at x^ 

^Ijf M = {l-u) ((2 - Axl)z - Qx,z^ - 2/) (45) 

where z = Zy = 2u'x,y/{l — u). We have 



and 



0<2S< inf <(x^) 
i/e[(5,i] 



sup ipl{xi,) < oo. 

v(^[5,l-ri\ 



Moreover, for small enough e > 0, the ratio 



D{e;5,ri)= sup sup | ■^/'^^•'(a;) /■?/'" (a;) | 

J^e[<5,l— r;] \x—x,y\<t 



is finite. 
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Proof: We can get the following first, second, third derivatives of the function ip,y{x): 

iljl{x) = -{l-u)g/G + 2u'x; 
£(x) = -il-u)ig'/G-g'/G') + 2u'; 
^(3)(x) = -(1 - u)ig"/G - 3g'g/G' + 2g^/G')- 
Because g' = (—2x)g, g" = (—2 + Ax'^)g, and 

g{xy)/G{xy) = -— ^ = z^ 

at the point x,,, we can immediately have (44) and (45). 

Notice that ^'l{xu) > 2v' , so it is bounded away from zero on any interval z/ G [5, 1], 5 > 0. 
Also, since Xi, is a continuous function bounded away from zero over v on the interval [5, l—rf] 
{5,1] > 0), we have ip'l{x^) is also bounded above over [6, 1 — i]]. 

Now as for ^^^\ we note that clearly x^ and z,y are continuous functions on [5, 1). And both 
are bounded on the interval u E [5, 1 —r^]. As a polynomial in u, Xy and z^,, ipu is also bounded. 
If we consider the interval {x^, — t.x^ + t), the boundness of the ratio D(e;5,r]) also holds 
uniformly over u E [5,1 — i]] by inspection if e > is small enough. 

■ 

VII. Bounds on the Internal Angle 

In this section, we will show how to get the internal angle decay exponent; namely we will 
prove the following lemma: 

Lemma 10: For e > and n > no(p, 5, e) 

n-i log(/3(F, G)) < Ant{l/n; k/l, 6) + e, 

uniformly in / > 5n, k < p5n, {I — k) > (u — 5p)n. 

Using the formula for the internal angle derived in the appendix, we know that 

-n-i log(/3(F, G)) = -n~^ log(5(^^L_, / _ A;)), (46) 



where 



B{a', m') = e"^ V(m' - l)a' + In-""' '^a''^'^ J{m' , 9), (47) 
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with e = {I -a') /a' and 

1 /'OO /'OO 

J{m', 9) = ^ ( / e-^"'+2i.A ^y)™'e-A2 rf^. (48) 

To evaluate (46), we need to evaluate the complex integral in J{m',9'). A saddle point 
method based on contour integration was sketched for similar integral expressions in [VS92]. A 
probabilistic method using large deviation theory for evaluating similar integrals was developed in 
[Don06b] . Both of these two methods can be applied in our case and of course they will produce 
the same final results. In this paper we will follow the probabilistic method from [Don06b] in 
this paper. The basic idea is to see the integral in J(m', 0') as the convolution of [m! + 1) 
probability densities being expressed in the Fourier domain. In [Don06b], it took mechanical 
manipulations of the characteristic functions of the normal and half-normal distribution to arrive 
at this probabilistic method. In the appendix of this paper, we will give a way of deriving the 
internal angle formula which leads naturally to this probabilistic method and clearly explains its 
physical meaning. 

More explicitly, we have the following lemma: 

Lemma 11: \Qi = {\ — a')/a' , where a' = -qt^ti- Let T be a random variable with the 
A^(0, |) distribution, and let Wm' be a sum of m' i.i.d. half normals Ui ~ HN{0, ^). Let T 
and Wm' be stochastically independent, and let gr+w , denote the probability density function 
of the random variable T + Wm' ■ Then^ 



B{a',m') = V^^^^^^^r^T^ " ^~"' ■ V^ ■ 9t+wA0). (49) 

Applying this probabilistic interpretation and large deviation techniques, it is evaluated as in 
[Don06b] that 

^T+w , <^-( r t;e-^^-™'^*(^^) dv + e-'^^'l , (50) 



where A* is the rate function for the standard half-normal random variable HN{0, 1) and /i^' is 
the expectation of Wm'- In fact, the second term in the sum is argued to be negligible [Don06b]. 
And after taking y = ^^v, we have an upper bound for the first term: 

2 - rV y,-rn'i^)y^-m'AHy)dy. (51) 

V^ ^(^ Jo 

^In [Don06b], the term 2"™' was 2^"""', but we believe that 2"""' is the right term. 
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A. Laplace's Method for ipint 

As we know, m' in the exponent of (51) is defined as [l — k). Similar to evaluating the external 
angle decay exponent, again we will use Laplace's method in evaluating the internal angle decay 
exponent. In fact, the function ^^^i of (28) appears in the exponent of (51), with 7' = -^^^tq- Since 
e=^ = C^k, we have 

a' ' 

9 C^k 

7 



Since k ~ p5n, I ~ vn. 



Define the function 



m' + e {C^-l)k + l' 
I k pS 

/ C^ — l 1 C^ — 1 r i^ 



where ^yiy) is the function as defined in (28). 

If we apply similar arguments as in proving Lemma 7, we will get the following lemma. 
Lemma 12: For 7' G (0, 1] let y^/ G (0, 1) denote the minimizer of ^y. Then 

/y,„.'(a^)rfa:<e-™'«vfey).i?^,(7') 



where, for 77 > 



This means that 



m' -^ sup log(i?m/(7')) = 0(1) as m! — )> 00. 



^TW„,(0)<e-'"'«v(^v)i?„,(7'). 
So applying (49), we get 

n-^ log(/3(F, G)) < (-Cy (yy) - log(2)) {u - p5) + o(l), 
where the o(l) is uniform over the range of k and /. 
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B. Asymptotics of C^y 

As in our previous discussion, we define 7' = ^.i_f —, so 7' can take any value in the 

range (0,1]. Now we are interested in studying the asymptotics of ^Y^yy) as 7' — )> 0. As in 
[Don06b], using the convex duality associated to the cumulant generating function A(s) and its 
dual A*, we have 

y = A\s), s = {A*y{y), 

defining a one-one relationship s = s{y) and y = y{s) between s < and < y < J '^. 

From these relations, following the same line of reasoning in [Don06b], we can get the 

minimizer yy of ^y 

1-7' 

— yy = -Sy, (52) 

where sy = s{yy). 

Because the cumulant generating function for a standard half-normal HN{0, 1) random vari- 
able Y is A(s) = s^/2 + log(2$(s)), where and $ are the standard density and cumulative 
distributions, we have from y = A'{s) that 

y{s) = s-{l-j^^),s<0 (53) 

where the function of M(s) is defined on s < with < M(s) < 1 and M(s) — )■ 1 as s — )> —00 

so that 

$(s) = M(s) ■ ^. 
\s\ 

Combining (52) and (53), we know that 

M(sy) = 1-7'- (54) 



Further, we can derive that 

So by the property of the function M(s) and (54), as 7' — )■ 0, sy — )> —00, we have 



^I'iVi') = -T,V%—J^ - log(2/7r)/2 + log(t/y/7')- (55) 



^ ^y 2 M(s) 2 1 

2^7r |s| 2^7r \s\ 
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By taking the logarithm for A(s), A(s) ~ — log|s| and A'(s) ~ — ^ as s — )■ — oo. So by 
y = A'(s), we have 



and by combining this with (52), as 7' — )■ 0, 



1/7' 



y{s) ~ -j— j- s — > —00, 



i 



i-y 

VIII. "Weak", "Sectional" and "Strong" Robustness 

So far, we have discussed the robustness of di minimization for sparse signal recovery in 
the "strong" case, namely we required robust signal recovery for all the approximately /c-sparse 
signal vectors x. But in applications or performance analysis, we are also often interested in 
the signal recovery robustness in weaker senses. As we shall see, the framework given in the 
previous sections can be naturally extended to the analysis of other notions of robustness for 
sparse signal recovery, resulting in a coherent analysis scheme. For example, we hope to get a 
tighter performance bound for a particular signal vector instead of a more general, but looser, 
performance bound for all the possible signal vectors. In this section, we will present our null 
space conditions on the matrix A to guarantee the performance of the programming (2) in the 
"weak", "sectional" and "strong" senses. Here the robustness in the "strong" sense is exactly 
the robustness we discussed in the previous sections. 

Theorem 3: Let A be a general m x n measurement matrix, x be an n-element vector and 
y = Ax. Denote A' as a subset of {1,2, . . . ,n} such that its cardinality \K\ = k and further 
denote K = {1,2, . . . ,n}\ K. Let w denote an n x 1 vector. Let C > 1 be a fixed number. 

• (Weak Robustness) Given a specific set K and suppose that the part of x on K, namely 

xx is fixed. Vx-^, any solution x produced by (2) satisfies 

2 „ „ 

||xi^||i - ||xx||i < -. IIxt^IIi 

and 

2C 

||(x-x)5^||i < ^7—^11x5^111, 

if and only if Vw G M" such that Aw = 0, we have 

||xi^ + wxlli + IIytIIi > ll^xlli; (56) 
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(Sectional Robustness) Given a specific set K C {1^2, . . . ,n}. Then Vx G M", any solution 
X produced by (2) will satisfy 

||x-x||i < ^_^ l|x:^||i, 

if and only if Vx' e M", Vw e M" such that Aw = 0, 






|x'^ + wx||i + ||^||i > llx'^lli; (57) 



• (Strong Robustness) If for all possible fc' C {1, 2, . . . , n}, and for all x G M", any solution 
X produced by (2) satisfies 

2(C + 1)„ „ 

I|x-X||i < ^_^ ||X;^||i, 

if and only if ViT C {1, 2, . . . , n}, Vx' G W, Vw G M" such that Aw = 0, 

||x'^ + wxlli + ll^lli > ||x';^||i. (58) 

Proof: We will first show the sufficiency of the null space conditions for the various 
definitions of robustness. Let us begin with the "weak" robustness part. Let w = ic — x and we 
must have Aw = A(x — x) = 0. From the triangular inequality for £i norm and the fact that 

||x||i > ||x + w||i, we have 

llXft-lll - ||Xft' + Vi^/^lli 

> ||w;^ + X;^||i- llx^^lli 

> IIWxIll -2||x;^||i. 

But the condition (56) guarantees that 

llw^^lli > C(||xx||i - ||Xi^ + wxlli), 
so we have 



2C 
and 



W:^||l < -||x;^||i, 



2 



||x_ft:||i — ||xi^||i < — — -||x-^||i. 

For the "sectional" robustness, again, we let w = x — x. Then there must exist an x' G M" such 
that 

\Wk + Wi^lli = ||x'^||i - ||w/^||i. 
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Following the condition (57), we have 

||Wi^||l < || — 111. 



Wxi 



Since 

||x||i > ||x + w||i, 

following the proof of Theorem 1, we have 

l|x-x||i < -^t-^-^W^kWi- 

The sufficiency of the condition (58) for strong robustness also follows. 

Necessity: Since in the proof of the sufficiency, equalities can be achieved in the triangular 
equalities, the conditions (56), (57) and (58) are also necessary conditions for the the respective 
robustness to hold for every x (otherwise, for certain x's, there will be x' = x + w with 
||x'||i < ||x||i while violating the respective robustness definitions. Also, such x' can be the 
solution to (2)). The detailed arguments will similarly follow the proof of the second part of 
Theorem 2. 

■ 

The conditions for "weak", "sectional" and "strong" robustness seem to be very similar, and 
yet there are indeed huge differences. The "weak" robustness condition is for x with a specific 
x/^ on a specific subset K, the "sectional" robustness condition is for x with all possible x/^'s 
on a specific subset K, and the "strong" robustness conditions are for x's with all possible x^^'s 
on all possible subsets. Basically, the "weak" robustness condition (56) guarantees that the ii 
norm of x^ is not too far away from the £i norm of x^ and the error vector w^^ is small in ii 
norm when ||x;j^||i is small. Notice that if we define 

IIwkIIi 

K = max — , 

Aw=0,W7^0 Wt^ 1 
M A M -L 

then 

.„ ^ 2C{1 + k).. ,, 
||x-x||i < —^t—^W^kWi- 

That means, if k is not oo for a measurement matrix A, ||x — x||i is also small when ||x-^||i 
is small. Indeed, it is not hard to see that, for a given matrix A, k < oo as long as the rank of 
matrix Ak is equal to |i^| = A;, which is generally satisfied for k < m. 
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While the "weak" robustness condition is only for one specific signal x, the "sectional" 
robustness condition instead guarantees that given any approximately A;-sparse signal mainly 
supported on the subset K, the £i -minimization gives a solution x close to the original signal by 
satisfying (3). When we measure an approximately /c-sparse signal x (the support of the k largest- 
magnitude components is fixed though unknown to the decoder) using a randomly generated 
measurement matrix A, the "sectional" robustness conditions characterize the probability that 
the £i minimization solution satisfies (3) for any signals for the set K. If that probability goes to 
1 as n — 7- oo for any subset K, we know that there exist measurement matrices A's that guarantee 
(3) on "almost all" support sets (namely, (3) is "almost always" satisfied). The "strong" robustness 
condition instead guarantees the recovery for approximately sparse signals mainly supported on 
any subset K. The "strong" robustness condition is useful in guaranteeing the decoding bound 
simultaneously for all approximately /c- sparse signals under a single measurement matrix A. 

Interestingly, after we take C = I and let (56), (57) and (58) take strict inequality for all 
w 7^ in the null space of A, the conditions (56), (57) and (58) are also sufficient and necessary 
conditions for unique exact recovery of ideally A;-sparse signals in "weak", "sectional" and 
"strong" senses [Don06b], namely the unique exact recovery of a specific ideally A;-sparse signal, 
the unique exact recoveries of all ideally A:-sparse signal on a specific support set K and the 
unique exact recoveries of all ideally A;-sparse signal on all possible support sets K. In fact, if 
||x;^||i = 0, from similar triangular inequality derivations in Theorem 1, we have x = x under 
all the three conditions. 

For a given value 5 = — and any value C > 1, we will determine the value of feasible 
C = - for which there exist a sequence of A's such that these three conditions are satisfied when 
n — 7> oo and — = 5. As manifested by the statements of the three conditions (56), (57) and 
(58) and the previous discussions in Section III, we can naturally extend the Grassmann angle 
approach to analyze the bounds for the probabilities that (56), (57) and (58) fail. Here we will 
denote these probabilities as Pi, P2 and P3 respectively. Note that there are (^) possible support 
sets K and there are 2'' possible sign patterns for signal xk- From previous discussions, we 
know that the event that the condition (56) fails is the same for all Xi^'s of a specific support 
set and a specific sign pattern. Then following the same line of reasoning as in Section III, we 
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have 

Pi = Pk,- (59) 



-,k 



P2 < 2'^ X Pi, (60) 



^» -k 



P3 < LI X 2'^ X Pi, (61) 

where Pr- is the probability as in (10). 

We have the following lemma about the Pi, P2 and P3: 

Lemma 13: For any C > 1, we define Cwi^), CseciS), and Csi^) to be the largest fraction 
C = - such that the condition (56) (57) and (58) are satisfied with overwhelming probability as 
n — 7- if we sample the (n — m) -dimensional null space uniformly, where ^ = 5. Then 

Cw{S) > 0, 

CseciS) > 0, 

CsiS) > 



for any C > 1 and 6 > 0. Also, 



limCpF(5) = 1 
5^1 



for any C > I. 

The proof of this lemma is listed in the appendix. It is worthwhile mentioning that the formula 
for Pi is exact since there is no union bound involved and so the threshold bound for the "weak" 
robustness is tight. In a short summary, the results in this section suggest that even if k is very 
close to the weak threshold for ideally sparse signals, we can still have robustness results for 
approximately sparse signals while the results using restricted isometry conditions [CRT05] may 
suggest smaller sparsity level for recovery robustness. This is the first such a kind of result. 
The numerical results of ( making sure that Pi, P2, P3 converge to zero overwhelmingly are 
presented in Section X. 

IX. Analysis of £1 Minimization under Noisy Measurements 

In the previous sections, we have analyzed the £1 minimization algorithm for decoding general 
signals. In this section, we will discuss the effect of noisy measurements on the £1 minimization 
of general signals, using the null space characterization. 
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Theorem 4: Assume that A is a general m x n measurement matrix A with rank m and its 
minimum nonzero singular value is denoted as a^nm- Further, assume that y = Ax + b, with its 
^2-norm ||b|| < e, and that w is an n x 1 vector. Let K be any subset of {1, 2, . . . , n} such that its 
cardinality \K\ = k and let Ki denote the i-th element of K. Further, let A' = {1, 2, . . . , n} \ i^. 
Then the solution x produced by (2) will satisfy 

.,, ^ 2(C+1),, „ (3C+l)v/^e 
||x - x||i < -— — — ||x;^ 1 + -- — — 

C - 1 (C - IjO-min 

with C > 1, if Vw G M" such that 

Aw = 

and for all the subsets K with \K\ = k, we have 

k n—k 

cJ2\wK^<Y.\wJ^^\. (62) 

1=1 1=1 

y = Ax + b, 

y = Ax*, 



Proof: Since 



we can write 



where 



I * II ^ ^ 
X — X < — 



By the Cauchy-Schwarz inequality, we have 

II * II ^ ^^ 
||x — x||i < . 

Suppose the matrix A has the claimed null space property. Now the solution x of (2) satisfies 
||x||i < II X* 111. Since Aic = y, it easily follows that w = x — x* is in the null space of A. 
Therefore we can further write ||x*||i > ||x* + w||i. Using the triangular inequality for the ii 
norm we obtain 

ll^xlli + ll^^lli = l|x*||i 

> ||x||i = ||x* + w||i 

> llx^lli - llwi^lli + llw^^lli - ||xi^||i 

n * n n * n C* - 1 „ „ 

> llXi^lli- ||x-j^||i + ^^^^||w||i, 
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where the last inequality is from the claimed null space property. Relating the first equality and 

r* II ^ ('^-l) 



the last inequality above, we have 2||xi^||i > v., | ||w||i. 



Since 



we get 



x^lli < ||x7^||i + ||x* -x||i, 



2(C+1 ) 
C-1 



|w||i < —^ ^||x2^||i 



^ 2(C+1)„ „ 2(C+1)„ , 

< -^ 7^l|x7?||i + ^^ ^||x*-x||i. 



C-1 " ^"^ ' C-1 



From the triangular inequality, 



|x — x||i < ||x — x*||i + ||w||i (63) 

^ 2(C + 1)„ „ 3C + 1„ , 

- ~r — ^ll^:^lli + 77^ — Tw — ■ ^"^^ 



If the elements in the measurement matrix A are i.i.d. as the unit real Gaussian random 
variables A^(0, 1), following upon the work of Marchenko and Pastur [MP67], Geman[Gem80] 
and Silverstein [Sil85] proved that for m/n = 5, as n — )• oo, 

—J=<ymin -> 1 - V5 

almost surely as n — )> oo. 

Then almost surely as n — t- oo, tr^^A\ '^ -^ ,^ n^ r^-, - So in this case, we have llx* — x||i is 

(,L^^ J-jCJ'min (C— 1)(1— Vo) 

upper-bounded by some constant times e. It is also worth mentioning that the error bound derived 
above is for a plain £i minimization optimization programming, which does not use any prior 
knowledge of the magnitudes of the noise in the computations, while the error bounds in the 
literatures often assume that such information is known and is used in the convex programming 
algorithms. To get an error bound in terms of £2 norm, we can invoke the almost Euclidean 
property of the null space,namely every vector w has an £2 norm scaling as 0{-j^) of its £1 
norm. Though we choose not to do it in detail in this paper, it is easy to see that the error 
bound here has the same scaling in e as the analysis through the restricted isometry property 
[CRT06]; however, this analysis is warranted even when the cardinality \K\ of the set K is 
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much larger than the known cardinality bounds for the restricted isometry property. It is also 
possible to extend the concepts of "weak", "sectional" and "strong" robustness analysis to noisy 
measurements, which will also similarly show that even if the cardinality of the set K is very 
close to the "weak" threshold for the ideally sparse signals, we can still have the robustness of 
ii minimization to noisy measurements. 

X. Numerical Computations on the Bounds of ( 

In this section, we will numerically evaluate the performance bounds on ^ = - such that the 
conditions (7), (56), (57) and (58) are satisfied with overwhelming probability as n — )■ oo. 
First, we know that the condition (7) fails with probability 

^ ^ s>0 Ge3„+i+2,(SP) 

Recall that we assume — = 5, I = (m + 1 + 2s) + 1 and u = -.In order to make P 
overwhelmingly converge to zero as n — t- oo, following the discussions in Section IV, one 
sufficient condition is to make sure that the exponent for the combinatorial factors 

fcom = lim (67) 

n— >oo n 

and the negative exponent for the angle factors 

, ,. log(/3(F,G)7(G,SP)) 

Wangle = " lim (68) 

satisfy tpcom - i' angle < Uniformly over v e [5,1). 

Following [Don06b] we take m = 0.5555n. By analyzing the decaying exponents of the 
external angles and internal angles through the Laplace methods as in Section VI and VII, we 
can compute the numerical results as shown in Figure 3, Figure 6 and Figure 7. In Figure 3, we 
show the largest sparsity level ^ = - (as a function of C) which makes the failure probability 
of the condition (9) approach zero asymptotically as n — )■ oo. As we can see, when C = 1, 
we get the same bound ( = 0.095 x 0.5555 ~ 0.0528 as obtained for the "weak" threshold for 
the ideally sparse signals in [Don06b]. As expected, as C grows, the ii minimization requires 
a smaller sparsity level ( to achieve higher signal recovery accuracy. 

In Figure 4, we show the exponents iJcom, ^int, i'ext under the parameters C = 2, 5 = 0.5555 
and ^ = 0.0265. For the same set of parameters, in Figure 5, we compare the exponents il>coni 
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Fig. 3: Allowable sparsity as a function of C (allowable imperfection of the recovered signal is 



2(C+l)A s 
C-1 ' 



External Angle 

Internal Angle 

■ Combinatorial 



0.55 0.6 



0.7 0.75 



0.85 0.9 0.95 



Fig. 4: The Combinatorial, Internal and External Angle Exponents 



and ipangie' the solid curve denotes ipangie and the dashed curve denotes il)com- It shows that, 
under C, = 0.0265, i^com — i'angie < uniformly over 5 < v < 1. Indeed, C = 0.0265 is the 
bound shown in Figure 3 for C = 2. In Figure 6, for the parameter S = 0.5555, we give the 
bounds (^ as a function of C for satisfying the signal recovery robustness conditions (56), (57) 
and (58) respectively in the "weak", "sectional" and "strong" senses. In Figure 7, fixing C = 2, 
we plot how large p = C/5 can be for different (5's while satisfying the signal recovery robustness 
conditions (56), (57) and (58) respectively in "weak", "sectional" and "strong" senses. 
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■ Combinatorial Exponent 

■ Angle Exponent | 
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Fig. 5: The Combinatorial Exponents and the Angle Exponents 
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Fig. 6: The Weak, Sectional and Strong Robustness Bounds 



XL Conclusion 

It is well known that t\ optimization can be used to recover ideally sparse signals in compres- 
sive sensing, if the underlying signal is sparse enough. While for the ideally sparse signals, the 
results of [Don06b] have given us very sharp bounds on the sparsity threshold the i\ minimization 
can recover, sharp bounds for the recovery of general signals or approximately sparse signals 
were not available. 

In this paper we analyzed a null space characterization of the measurement matrices for 
the performance bounding of £i-norm optimization for general signals or approximately sparse. 
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Fig. 7: The Weak, Sectional and Strong Robustness Bounds 



Using high-dimensional geometry tools, we give a unified null space Grassmann ang/e-based 
analytical framework for compressive sensing. This new framework gives sharp quantitative 
tradeoffs between the signal sparsity parameter and the recovery accuracy of the £i optimization 
for general signals or approximately sparse signals. As expected, the neighborly poly topes result 
of [Don06b] for ideally sparse signals can be viewed as a special case on this tradeoff curve. 
It can therefore be of practical use in applications where the underlying signal is not ideally 
sparse and where we are interested in the quality of the recovered signal. For example, using the 
results and their extensions in this paper and [Don06b], we are able to give a precise sparsity 
threshold analysis for weighted £i minimization when prior information about the signal vector 
is available [KXAH09]. In [XKAHIO], using the robustness result from this paper, we are able to 
show that a polynomial-time iterative weighted ii minimization algorithm can provably improve 
over the sparsity threshold of (.i minimization for interesting classes of signals, even when prior 
information is not available. 

In essence, this work investigates the fundamental "balancedness" property of linear subspaces, 
and may be of independent mathematical interest. In future work, it is interesting to obtain more 
accurate analysis for compressive sensing under noisy measurements than presented in the current 
paper. 
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XII. Appendix 

A. Some Concepts in the High Dimensional Geometry 

In this part, we will give the explanations of several often used geometric terminologies in 
this paper for the purpose of quick reference. 

1) the Grassmann Manifold: The Grassmann manifold Grj(j) refers to the set of i-dimensional 
subspaces in the j -dimensional Euclidean space W . It is known that there exists a unique invariant 
measure ji' on Grj(j) such that /i'(Grj(j))=l. 

For more facts on the Grassmann manifold, please see [B0086]. 

2) Polytope, Face, Vertex: A polytope in this paper refers to the convex hull of a finite number 
points in the Euclidean space. Any extreme point of a polytope is a vertex of this polytope. A 
face of a polytope is defined as the convex hull of a set of its vertices such that no point in this 
convex hull is an interior point of the polytope. The dimension of a face refers to the dimension 
of the affine hull of that face. The book [GriiOB] offers a nice reference on the convex polytopes. 

3) Cross-polytope: The n-dimensional cross-polytope is the polytope of unit ii ball, namely 
it is the set 

{xGM'' I ||x||i = 1}. 

The n-dimensional cross-polytope has 2n vertices, namely ±ei, ±62, ..., ±e„, where ej, 1 < i < 
n, is the unit vector with its i-th coordinate element being 1. Any k extreme points without 
opposite pairs at the same coordinate will constitute a (A; — 1) -dimensional face of the cross- 
polytope. So the cross-polytope will have 2''(^') faces of dimension (A; — 1). 

4) the Grassmann Angle: The Grassmann angle for a n-dimensional cone <t under the Grass- 
mann manifold Grj(n), is the measure of the set of i-dimensional subspaces (over Grj(n)) which 
intersect the cone C nontrivially (namely at some other point besides the origin). For more details 
on the Grassmann angle, internal angle, and external angle, please refer to [Gru68][Gru03][McM75]. 

5) the Internal Angle: An internal angle (3(Fi, F2), between two faces Fi and F2 of a polytope 
or a polyhedral cone, is the fraction of the hypersphere S covered by the cone obtained by 
observing the face F2 from the face Fi.The internal angle (3{Fi, F2) is defined to be zero when 
Fi ^ F2 and is defined to be one if Fi = F2. Note the dimension of the hypersphere S here 
matches the dimension of the corresponding cone discussed. Also, the center of the hypersphere 
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is the apex of the corresponding cone. All these defaults also apply to the definition of the 
external angles. 

6) the External Angle: An external angle 7(^3, F4), between two faces F3 and F4 of a polytope 
or a polyhedral cone, is the fraction of the hypersphere S covered by the cone of outward normals 
to the hyperplanes supporting the face F4 at the face F3. The external angle 7(^3, F4) is defined 
to be zero when F3 ^ F4 and is defined to be one if ^3 = ^4. 

B. Proof of Lemma 1 

Proof: The first statement is obvious since multiplying A with a unitary keeps the columns 
independent and the entries i.i.d. Gaussian. 

Now let us look at the proof of the second statement. Consider the Singular Value Decomposi- 
tion (SVD) A = UT.V*, where U and V have orthonormal columns and S is diagonal. Consider 
now AQ, for any given deterministic unitary 0: AQ = UT.V*Q. This is clearly the SVD of AQ; 
in particular, Q*V represents the right singular vectors of AQ. Since A and AQ have the same 
distribution (for all unitary 0), the same must be true of the right singular vectors V and Q*V. 
Therefore the distribution of V is left-rotationally invariant: Pv{V) = Pv{Q*V). Now the null 
space of A can be written as Z = V^X, where V^ is an n x (n — m) matrix with orthonormal 
columns that are orthogonal to V , i.e., V*V^ = 0, and X is any invertible (n — m) x (n — m) 
matrix. Now it is easy to see that if we change V to Q*V, for any unitary 0, we must change 
V-^ to Q*V^. But since left-multiplication by a unitary 0* does not change the distribution of 
V, left multiplication by a unitary 0* must not change the distribution of V^. Thus V^, and 
by fiat Z = V-^X, are left-rotationally invariant. Note to simplify the arguments, we have so 
far assumed that the matrix A is of full rank m, which is true with probability 1. However, we 
should note that these arguments also work when the matrix A is rank-deficient. 

Let G be an n X [n — m) matrix with i.i.d. A/'(0, 1) entries and consider the QR decomposition: 
G = QR, where Q is an n x (n — m) matrix with orthonormal columns and R is an (n — m) x 
(n — m) upper triangular matrix with non-negative diagonals. Then it is well known that Q has a 
left-rotationally invariant distribution, and that i? is a random matrix, independent of Q, whose 
strictly upper triangular entries are i.i.d. J\f(0, 1) and whose i-th diagonal entry is an independent 
Chi-square random variable with (n — i + l)/2 degrees of freedom [Mui05]. This implies that we 
can always take the V-^ obtained from the 2nd statement and post-multiply it by an independent 
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upper triangular R (with the aforementioned distribution) to obtain a matrix Z = V-^R with 
i.i.d. A/'(0, 1) entries. It is always possible to choose a basis Z for the null space such that Z 
has i.i.d. A/'(0, 1) entries. 



C. Derivation of the Internal Angles 

There are two situations in the derivations of the internal angles /3(F, G) for the skewed cross- 
polytope: when G is a regular face and when G is the whole skewed cross-polytope SP. These 
two cases are respectively dealt with in Lemma 14 and Lemma 15. 

Lemma 14: Suppose that F is a (A; — 1) -dimensional face of the skewed cross-polytope 

SP = {yGM" I ||yi^||i + ||^||i<l} 

supported on the subset K with \K\ = k. Then the internal angle (3{F, G) between the {k — 1)- 
dimensional face F and a (/ — 1) -dimensional face G (F C G, G t^SP) is given by 

V/_fc-i(S' " 1) 
where Vi{S^) denotes the i-th dimensional surface measure on the unit sphere S\ while Vi{a',i) 

denotes the surface measure for regular spherical simplex with (i + l) vertices on the unit sphere 

S^ and with inner product as a' between these (i + 1) vertices. (15) is equal to B{j-^^, I — k), 

where 

B{a', m') = e"^ V(m' - l)a' + Ivr-^^'/^a'-i/^ j(m', 9) (70) 

with 6* = (1 - a') /a' and 

1 POO /'OO 

J{m',e) = ^ (/ e-'^'^'+^'^'Uvr' e-^' dX (71) 

VTT J_oo Jo 

Proof: Without loss of generality, assume that F is a (A; — 1) -dimensional face with k 
vertices as Cp, 1 < p < k, where Cp is the n-dimensional standard unit vector with the p-th 
element as '1'; and also assume that the (/ — 1) -dimensional face G be the convex hull of the / 
vertices: Cp, I < p < k and Gcp, {k + I) < p < I. Then the cone Conirc formed by observing 
the (/ — 1) -dimensional face G of the skewed cross-polytope SP from an interior point x^ of 
the face F is the positive cone of the vectors: 

Ccj - ei, for all j E J\K, i E K, (72) 
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and also the vectors 

Cj^ — ej2, for all ii E K, 12 E K, (73) 

where J = {1,2,...,/} is the support set for the face G. 

So the cone Conpc is the direct sum of the linear hull Lp = \m{F — x^} formed by the 
vectors in (73) and the cone Con^^x ^ = Conp^cCl^F^ where Lp is the orthogonal complement 
to the linear subspace Lp. Then Con^± ^ has the same spherical volume as Conpc- 

Now let us analyze the structure of Conp± q. We notice that the vector 

k 
Co = ^ Cp 

p=l 
is in the linear space Lp and is also the only such a vector (up to linear scaling) supported on 
K. Thus a vector x in the positive cone Con^± ^ must take the form 

k I 

- J^ 6j X Ci + ^ hxci, (74) 

i=l i=k+l 

where bi,l < i < I are nonnegative real numbers and 

k I 

1=1 i=k+l 

bi = b2 = ■ ■ ■ = bk- 

That is to say, the (/ — /c) -dimensional Con^± ^ is the positive cone of (/ — A;) vectors 

a^,a'^, ...,a^~'^, where 

k 

a^ = C X Ck+i — y^ Cp/A;, I <i < {I — k). 
p=i 

The normalized inner products between any two of these (/ — k) vectors is 

<a\a^ > _ kx ^ _ 1 



\\am\am C^ + kx^ 1 + kC^' 

II II II II k'^ 

( In fact, a*'s are also the vectors obtained by observing the vertices Ck+i, ■ ■ ■ , e/ from Ec = 
X]p=i ^p/k, the epicenter of the face F.) 

We have so far reduced the computation of the internal angle to evaluating (69), the relative 
spherical volume of the cone Conp± q with respect to the sphere surface 5*'^^^^ This was 
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computed as given in this lemma [VS92], [KBH99] for the positive cones of vectors with equal 
inner products by using a transformation of variables and the well-known formula 



V,-i{S 



i-l\ 



r(f + i^ 



where r(-) is the usual Gamma function. 

Instead, in this paper, we will give a proof of (70) which can directly lead to the probabilistic 
large deviation method of evaluating the internal angle exponent in [Don06b] . 

First, we notice that Con^± ^ is a (/ — A;) -dimensional cone. Also, all the vectors (xi, ■ ■ ■ , Xn) 
in the cone Con^± ^ take the form in (74). From [Had79], 

I e-W^'W' dx' = P{F,G)Vi^,.,{S'-^'^) 

/"OO 

X / e-'^V'-'^-^ rfr = /3(F, G) ■ 7r('-'=)/2^ (75) 

Jo 

where Vi^k-i{S^^^^^) is the spherical volume of the (l — k — 1) -dimensional sphere S^~'^~^. Now 
define U C ]R'^''+^ as the set of all nonnegative vectors satisfying: 

l-k+l 

Xp >0, l<p<l — k + l, y^ Xp = Ckxi 

p=2 

and define /(xi, ■ ■ ■ , xi-k+i) '■ U — )■ Con^± ^ to be the linear and bijective map 

k I 

f{xi, ■■■, xi-k+i) = -^xiep+ ^ Xp-kXCp. 



p=l p=k+l 



Then 



e-'l-'ll' dx' 



Con^x^G 



l + {C^-l)k 

^ ^ JY^pJ^^ ^P=Ckxi,Xp>0, 2<p<l-k+l 

e'^^^^'^^^^' dx2---dxi-k+i 



1 + {C^- l)k 

^ ^ ■fj2p''Jt^ '■ep=Ckxi,Xp>0, 2<p<l-k+l 



2 ^2 ,„2 



HjaL. -i Jb r\ ''' Jb 



'-'=+1 dx2--- dxi-k+i (76) 



where \/ c'^k ' ^^ ^"^ ^'^ ^^ change of integral variables. 
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In fact, when 



l~k+l 






>0, I <p<l- k + 1, ^ Xp = Ckxi, 

p=2 



the function / is a linear transformation over the variables X2, ■■■,xi-k+i with the following 
transformation matrix M (disregarding the indices beyond /) 



Ck 



-m 1 •■■ 
: 1 ■■■ 



\ 



1 

' Ck 







l+(C'^-l)k 



"C^ 



\ -Ck ■■■ -cfc u u ■■■ V 

It can then be calculated that the Jacobian of this transformation is Vdet MM"^ = 
which accounts for the coefficient appearing in (76). 
Now we define a random variable 

where Xi,X2,--- ,X„ are independent random variables, with Xp ~ HN{0,^), 2 < p < 
(n—k+l), as half-normal distributed random variables and Xi ~ A^(0, 2^) as a normal distributed 
random variable. Then by inspection, (76) is equal to 

-l-k+l 

X y/l+(C^-l)kpziO). 



2i-k 

where Pzi') is the probability density function for the random variable Z = X2 + X3 + ■ ■ ■ + 
Xi-k+i — CKXi and pz{0) is the probability density function pz{z) evaluated at the point z = 0. 
Use the notation 

POD 

GxiX) = / e'^''pxix)dx 



as the characteristic function for any random variable X, where px{x) is the probability density 
function of X. Then from the independence of Xi,X2, . . . ,Xi_k+i, the characteristic function 
for Z is equal to 

GziX) = Gx,i\y-'xGxA>^). 

Expressing the probability density function pz{x) in the Fourier domain, we have 

COD 



Pz{0) 



1 

2^ 



Gz{\)d\. 
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Combining this with (75), we can obtain the desired result (70); and, as we already see in the 
derivations, we naturally arrive at the probabilistic explanation for the internal angle. ■ 

So far at this point, we have only considered the internal angle (3{F,G) when G is not the 
whole skewed cross -poly tope. The following lemma discusses the special case when G = SP. 

Lemma 15: Suppose F, K and SP are defined in the same way as in the statement of 
Lemma 14. Then the internal angle /3(F, SP) between the {k — 1) -dimensional face F and 
the n-dimensional skewed cross-polytope SP is given by 



oo J —oo \Jo 

on— fc— 1 

71 



Proof: We use the same set of notations as in the proof of Lemma 14. Without loss of 
generality, assume K = {I,--- ,k}, F is the (A; — 1) -dimensional face supported on K and 
G = SP. So the cone Conpc is the direct sum of the linear hull Lp = \m{F — x^} formed by the 
vectors in (73) and the cone Conp± q = Conpc f] Lp, where Lp is the orthogonal complement 
to the linear subspace Lp. Then Con^± ^^ has the same spherical volume as Cowpc- 

Following similar analysis in Lemma 14, the cone Con^± g is the positive cone of 2(n — k) 
vectors a^, a|, ..., aj"^, where 

k 

a!|_ = ±C X Cfc+j — 2_, Cp/^) 1 <i < {n — k). 
p=i 

This also means that Conp± q is a {n — k + 1) -dimensional cone. Also, all the vectors 
(6i, ■ ■ ■ , bn) in the cone Con^x ^ take the form 

bi = b2 = ■ ■■ = bk <0, 

n 

^^ 'bp\ <Gk\bi\. 



p=A;+l 



From [Had79], 



/ 

J Con 



e-ll-'ircia;' = /3(F,G)V;_fc(5"-'=) 



oo 



X / e-''r"-'^rfr = /3(F,G')-7r("-'^+^)/^ (77) 

'o 
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where Ki_fc(5'" ^) is the spherical volume of the (n — A;) -dimensional sphere S*" ''. Now define 
U C iij^-'^+i as the set of all the vectors taking the form: 

n—k+l 

{xi > 0, y^ \xp\ < Ckxi} 

p=2 

and define /(xi,X2, ■ ■ ■ , Xn-k+i) '■ U — )• Conp±Q to be the linear and bijective map 

k n 

f{xi, ■■■, Xn~k+i) = -^xiep+ ^ Xp-k+ixep. 



p=l p=k+l 



Then 



e-\\-'\\' dx' = Vk I e-ll^(-)ll'cia; 



oo 



'0 ^Epr|+i|xj,|<Cfexi 

^_fca;2-x2 a^Lfc+i dx2 ■ ■ ■ dXn-k+1 dXi (78) 

where \/A; is due to the change of integral variables. 
By inspection, (78) is equal to 

v^""'^'P(X2 + X3 + ■ ■ ■ + X„_fe+i - CkX, < 0), 

where Xi,X2, ■ ■ ■ ,Xn-k+i are independent random variables, with Xp ~ HN{0, |), 2 < p < 
(n— A;+l), as half-normal distributed random variables and Xi ~ A^(0, 2^) as a normal distributed 
random variable. 

Expressing the probability density function of Z = X2 + X3 + ■ ■ ■ + Xn-k+i — CKXi in the 
Fourier domain, we can simplify (78) to 

/"OO 99/ /"OO \ n-k 

00 J~oo \Jo 

nn—k—1 

^ -iXz 



e~'^' d\ dz 



Combining this with (77) gives us the desired result. 
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D. Derivation of the External Angles 

Lemma 16: Suppose that F is a (A; — 1) -dimensional face of the skewed cross-polytope 

SP = {yGM"| ||yi^||i+||^||i<l} 

supported on a subset K with \K\ = k. Then the external angle 7(6", SP) between a (/ — 1)- 
dimensional face G (F C G) and the skewed cross-polytope SP is given by 

l{G, SP) = ^^3iTT / e--'( / "r"^ e-y' dyy-' dx. (79) 

Proof: Without loss of generality, assume K = {n — k + 1,- ■ ■ ,n}. Consider the (/ — 1)- 
dimensional face 

G = conv{C X e"-'+\ ..., C x e"-^ e"-^+\ ..., e"} 

of the skewed cross-polytope SP. The 2"~' outward normal vectors of the supporting hyperplanes 
of the facets containing G are given by 

n—l n—k n 

p=l p=n—l+l p=n—k+l 

Then the outward normal cone c{G^ SP) at the face G is the positive hull of these normal 
vectors. Thus 



e-W'rfx = 7(G',5P)K-,(5"-') 

c(G,SP) 



00 

2 



x/ e"" r"-'rfx = 7(G,SP).7r("-'+i)/^ (80) 

where \4-K'S'" is the spherical volume of the (n — /) -dimensional sphere S*""'. Now define 
U to be the set 

{x e M"-'+i I x„_^+i > 0, \xp\ < ^^, 1 < p < (n - /)} 

and define f{xi, • • • , x„_i+i) : U — )■ c(G', SP) to be the linear and bijective map 

n—l n—k 

p=l p=n—l+l 

n 

+ ^ a;„_i+i X Cp. 

p=n— fc+l 
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Then 



_ ,.'l|2 

e 



-"^'"^rfx' 



c(G,SP) 



k+^-^ I e-!l^(-)ll^ rfx 



(7 



" -^^ - -TT 



"ra-i + l 



i-k r^ r^^- r^r 



e ^1 - ^n-i ('^+-^K-! + l (ia;^ . . . dXn-l + l 



oo 



^+iz^/-e-('=-^)^^ 



n—l 



C 



,,2 




e ^ dy \ dx 

\ n— i 

-^ 2 \ 

c^ e y dy ] dx, 



where \/k + ^ is due to the change of integral variables. Combining it with (80) leads to the 
desired result. ■ 



E. Proof of Lemma 3 

Consider any fixed 5 > 0. First, we consider the internal angle exponent ifjint, where we define 
i = -.^ . Then for this fixed 6, 

uniformly over u E [5, 1] . 

Now if we take p small enough, '^^ ^ ^ can be arbitrarily large. By the asymptotic 
expression (31), this leads to large enough internal decay exponent ^mt- At the same time, 
the external angle exponent 4'ext is lower-bounded by zero and the combinatorial exponent is 
upper-bounded by some finite number. Then if p is small enough, we will get the net exponent 
tjjnet to be negative uniformly over the range u G [6, 1] . 
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F. Proof of Lemma 4 

We will show that for fixed C > 1, with p{5) = ^ log(l/(5)~(^+'') and some 6o > 0, 

ipnet{i^; p{S), 6) < ~S, 6 < 6o, V e [6, 1). 

To this end, we need to get the asymptotic of ^jjint{i^), ^ext{i^) and 4'comii^) as 5 — )■ and 
p{6) =log(l/5)-(i+^'). 
With 

Hiu) + Hip5/u)u = H{p5) + ^(^^)(1 - P5). 

from its definition, ^net{t^] P, S) is equal to 

Hiu) - ^extW) - ^j'iyy)ii^ - pS) + p(51og(2) + H{p5/u)u. 

From the derivation (or the expression) of the external angle 7(G, SP) in this paper, 7(6', SP) 
is a decreasing function in C. So we can upper-bound 7(6", SP) uniformly in z/ G [5, 1], for any 
C > 1, by 

^J^l Jo Jo 

namely the expression for the external angle when C = I. 

Now define Q^u) = H{v) — V'£;T^(^)' where ^^^^{i^) is the external exponent when C = 1. 
Then from the asymptotic formula (27), we have 

H{u)-iP,,t{u) < niu) ~ ilog(log(-))z/, 

as z/ — 7- 0. 

So i^netii^] P, S) is no bigger than 

n{u) - ^y{yy){u - p5) + p51og(2) + H{p6/u)u. 

From [Don06b], for a certain 5i, if 5 < 5i, 

H{p6/u) <H{p)6 + 2p{u-6), 

so we have 

^netiiy) < K{u; p, 6) + [p6 log(2) + H{p)6] + 2p{u - 6), 

where -/^(z/; p, 5) = r2(z/) — ^y{y^i){iy — p5). 
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As we will show later in Lemma 17, K{u; p, 5) is a concave function in u E [5, 1] if 6 < 82. 
Also, we will show that for 5 < 63, 

K'{6;p,6)<-r]/4\og{\og{^)) (81) 

K{6-p,6)<-6xr]/4log{\og{^)) (82) 

where K^u; p, S) = — ^f" ' . So there exists a ^4 > so that for any S G (0, ^4), 

K'i5;p,5)<-2p. 

Also there exists a ^5 > so that for any < 6 < 5^, 

K{6- p, 6) + p6 log(2) + H{p)6 < -6. 

Then by the concavity of K{iy; p, 5), if 5 < min {61, 82, 6^, 64, 65), 

uniformly over the interval u E [5, 1] . 

Now we need to prove (81) and (82). As computed in [Don06b], 

fi'(z/)~ilog(log(-)), z/->0, 
2 u 

and 

fi(z/) ~ -log(log(-))z/, z/^0. 
By (31), we know that as 5 — )■ 0, and with u = 5, 

Hence for 5 < 5^, 



Cyd/y) ~ o log(7^?7) ~ o log(log(T))(i + v)- 



^y(yy)(u-p6)>{l + ^M5). (83) 

Following this, there exists a ^7 > so that for 6 < 5t, 

K{6;p,6)<^\og\og{-^)6. 

Also, from the asymptotic of r2'(z/) and the asymptotic of the derivative of ^^i{y^i){v — p5) 
with respect to v in the next Lemma 17, we can further have 

K\5-p,5)<-r^/A\og{\og{^)). 
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Lemma 17: If p is small enough, K{v] p, 5) is concave as a function of u. 
Proof: We define 

T{u;p,6) =^y{yy){u- p6). 

Since K{i'; p, 5) = f2(z/) — T(z/; p, 5) and r2(z/) is a concave function in u [Don06b], we only 
need to show that T(z/; p, 6) is a convex function in u. 
Recall that 7' = 7^2-r^ and we first look at ^: 



d'j' p6 



X 



5^ ^p5 + ^ (C2-l)p5 



^2 y'-' 1^ (-12 

So 



. ., X ^^7'(^y) / 1-17 






If we define S = ^^,^~^"^^^ and H = j^.^-L-^^ we can have 

d'T{v-p,5) 



d^yivy 



d'j' du d'j' du 

-7') • (1 - s) • n 



97' 

d^yiyy} / . ^ ^ '9^^y(yy) ,2 . ^ . j-[ 
97' (9z/ (97'2 



d'j' d'j' 

I ^'^y(^y) ,/2 ^ n 

97'2 ^ " 

It has been shown in [Don06b] that as 7 — )> 0, 

■9^7(^7) 7~^ 

d-f ^ 2 ' 

^72 ^ 4 ' 
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So from the definition of 7', tliere exists a small enough po such that for any p < pq, 

^^I^>0,..1M1. (84) 

which then implies the concavity of K{iy; p, 6). ■ 

G. Proof of Lemma 5 

Proof: Suppose instead that pn{5, C) > ^7^. Then for every vector w from the null space 
of the measurement matrix A, any Pn{S, C) fraction of the n components in w take no more 
than ^jq^Y fraction of ||w||i. But this can not be true if we consider the Pn{S, C) fraction of w 
with the largest magnitudes. 

Now we only need to prove the lower bound for pAr(5, C); in fact, we argue that 

n (^ r\^ Piv(^,g = l) 
Pn[o,C) > — . 

We know from Lemma 3 that pj^(5,C) > for any C > 1. Denote ^netiC), i^comi^'i P^^iC), 
i'inti'i^'-, P, 5, C) and ipextii^] P, 5, C) as the respective exponents for a certain C. Because pAr((5, C = 
1) > 0, for any p = pjv(5, C = 1)— e, where e > is an arbitrarily small number, the net exponent 
i^net{C = 1) is negative uniformly over u E [5, 1]. 

By examining the formula (18) for the external angle 7(6", SP), where G is a (/ — 1)- 
dimensional face of the skewed cross-polytope SP, we have 7(6", SP) is a decreasing function 
in both k and C for a fixed /. So 7(6", SP) is upper-bounded by 

e-"'(/ e-y' dy)""-' dx, (85) 



n-l+l 
TT Jo Jo 

namely the expression for the external angle when C = 1. Then for any C > 1 and any k, 

i^exti.^; P, S, C) is lower-bounded by ^extii^; p,6,C =1). 

Now let us check V^j„i(z/; p, 6, C) by using the formula (29). With 

p5 



I 

7 



„2 /50 + ^2 



Q2 H^ I c^ 

we have 



7' 1 z/ 

-T^ + T^- (86) 



7' C^ C^p5 

Then for any fixed 5 > 0, if we take p = ^^ '^2' , where e is an arbitrarily small positive 
number, then for any u > 5, ^—r- is an increasing function in C. So, following easily from its 
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definition, ^yiyj') is an increasing function in C. This further implies that i^mti^', P, 5) is an 
increasing function in C if we take p = ^"^^ '^2 , for any u > 6. 

Also, for any fixed u and 5, it is not hard to show that iJcomit^] P, S, C) is a decreasing function 
in C if p = ^^^(^=ii. This is because in (14), 

k)\l-k) ~ \l)\k. 

Thus for any C > 1, if p = ^'^ '^2 % the net exponent ipnet{C) is also negative uniformly 

over u E [5, 1] . Since the parameter e can be arbitrarily small, our claim and Lemma 5 then 

follow. ■ 



H. Proof of Lemma 13 

Proof: First, we notice that for any C > 1, 

Cw{S)>pn{S,C)6, 
Csi6)=pj,i5,C)5, 



so by Lemma 3, 



Now we will prove 



Cw{S) > 0, 

CseciS) > 0, 

CsiS) > 0. 



\im(wiS) = 6. 

5->l 



As discussed in previous sections, we know that the decay exponent for the probability that the 
condition (56) is violated is equal to 

H( l_ ^ )(^ - P^) - ^7'(l/7')(^ - P^) - i^ext{l^)- 

But from the derivations of the exponents, we know that 



= lim sup ^pext{i^), 
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= lim sup H{ -), 

-5^1^.6 [5,1] i- - PO 
^y (|/y)(z/ - p6) > ^y(5)(t/y(5))(l - p)6 > 0, U > 5, 

where 

p6 p 



7'(<5) 



C2-l^r I 5 C2-l^ I 1 



Noticing that ^^/(5)(y^/(5))(l — p) > is only determined by p, for any < p < 1, there exists 
a big enough 5 < 1, such that 



uniformly over [6, 1] . Then it follows that 



\imCw{5) = 1, 

(5^1 



for any C > I. 



Acknowledgment 

This work was supported in part by the National Science Foundation under grant no. CCF- 
0729203, by the David and Lucille Packard Foundation, and by Caltech's Lee Center for 
Advanced Networking. 

References 

[AS92] Fernando Affentranger and Rolf Schneider. Random projections of regular simplices. Discrete Comput. Geom., 

7(3):219-226, 1992. 
[BCT09] J. D. Blanchard, C. Cartis, and J. Tanner. The restricted isometry property and £q regularization: Phase transitions 

for sparse approximation. 2009. Submitted for publication. 
[BDDW08] R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin. A simple proof of the restricted isometry property for 

random matrices. Constructive Approximation, 28(3):253-263, 2008. 
[B0086] W. M. Boothby. An Introduction to Differential Manifolds and Riemannian Geometry. Springer- Verlag, 1986. 2nd 

ed. San Diego, CA: Academic. 
[Can06] E. J. Candes. Compressive sampling. In International Congress of Mathematicians. Vol. Ill, pages 1433-1452. Eur. 

Math. Soc, Zurich, 2006. 
[CDD08] A. Cohen, W. Dahmen, and R. DeVore. Compressed sensing and best fc-term approximation. J. Amer Math. Soc. 

22 (2009), 211-231, 2008. 
[CM73] J. F Claerbout and F Muir. Robust modeling with erratic data. Geophysics, 38(5):826~844, 1973. 

May 21, 2010 DRAFT 



57 



[CRT05] Emmanuel J. Candes, Justin Romberg, and Terence Tao. Stable signal recovery from incomplete and inaccurate 

measurements. Communications of Pure and Applied Mathematics, 59:1207-1223, 2005. 
[CRT06] E. J. Candes, J. Romberg, and T. Tao. Robust uncertainty principles: exact signal reconstruction from highly 

incomplete frequency information. IEEE Trans. Inform. Theory, 52(2):489-509, 2006. 
[CT05] Emmanuel J. Candes and Terence Tao. Decoding by linear programming. IEEE Transactions on Information 

Theory, 51(12):4203^215, 2005. 
[DMMIO] D. Donoho, A. Maleki, and A. Montanari. The noise-sensitivity phase transition in compressed sensing. 

arXiv: 1004.1218, 2010. 
[Don06a] D. L. Donoho. Compressed sensing. IEEE Trans. Inform. Theory, 52(4): 1289-1306, 2006. 
[Don06b] David Donoho. High-dimensional centrally symmetric polytopes with neighborliness proportional to dimension. 

Discrete and Computational Geometry, 35(4):617-652, 2006. 
[DT05] David L. Donoho and Jared Tanner Neighborliness of randomly projected simplices in high dimensions. Proc. 

Natl. Acad Sci. USA, 102(27):9452-9457, 2005. 
[FN03] A. Feuer and A. Nemirovski. On sparse representation in pairs of bases. IEEE Transactions on Information Theory, 

49(6):1579-1581, 2003. 
[Gem80] Stuart Geman. A limit theorem for the norm of random matrices. Annals of Probability, 8(2):252-261, 1980. 
[Grii68] Branko Grunbaum. Grassmann angles of convex polytopes. Acta Math., 121:293-302, 1968. 
[Gru03] Branko Grunbaum. Convex polytopes, volume 221 of Graduate Texts in Mathematics. Springer- Verlag, New York, 

Second Edition, 2003. Prepared and with a preface by Volker Kaibel, Victor Klee and Gnterii M. Ziegler. 
[Had79] H. Hadwiger. Gitterpunktanzahl im simplex und willssche vermutung. Math. Ann., 239:271-288, 1979. 
[HN] J. Haupt and R. Nowak. Signal reconstruction from noisy random projections. IEEE Trans, on Information Theory, 

52. 
[KBH99] Jr. Karoly Boroczky and Martin Henk. Random projections of regular polytopes. Arch. Math. (Basel), 73(6):465- 

473, 1999. 
[KT07] Boris S. Kashin and Vladimir N. Temlyakov. A remark on compressed sensing. Mathematical Notes, 82(5):748-755, 

November 2007. 
[KXAH09] M. Amin Khajehnejad, Weiyu Xu, Amir Salman Avestimehr, and Babak Hassibi. Weighted li minimization for 

sparse recovery with prior informationn. In Proceedings of the International Symposium on Information Theory, 

2009. 
[LN06] N. Linial and I. Novik. How neighborly can a centrally symmetric polytope be? Discrete and Computational 

Geometry, 36(6):273-281, 2006. 
[McM75] Peter McMuUen. Non-linear angle-sum relations for polyhedral cones and polytopes. Math. Proc. Cambridge 

Philos. Soc, 78(2):247-261, 1975. 
[MP67] V. A. Marvcenko and L. A. Pastun Distributions of eigenvalues for some sets of random matrices. Math. USSR- 

Sbornik, 1:457-483, 1967. 
[Mui05] R. J. Muirhead. Aspects of Multivariate Statistical Theory. Wiley-Interscience, 2005. 2nd ed. 
[RIC] http://www.dsp.ece.rice.edu/cs/. 

[RV] Mark Rudelson and Roman Vershynin. Geometric approach to error correcting codes and reconstruction of signals. 

International Mathematical Research Notices, 64. 



May 21, 2010 



DRAFT 



58 

[San52] L. A. Santalo. Geometria integral enespacios de curvatura constante. Rep.Argetina Publ.Com.Nac.Energi Atomica, 

Ser.Mat 1, No.l, 1952. 
[Sil85] J. W. Silverstein. The smallest eigenvalue of a large dimensional wishart matrix. The Annals of Probability, 

13:1364-1368, 1985. 
[Sto09] Mihailo Stojnic. Various thresholds for £i -optimization in compressed sensing. 2009. Preprint available at http: 

//arxiv.org/abs/0907.3666. 
[SXH08a] Mihailo Stojnic, Weiyu Xu, and Babak Hassibi. Compressed sensing - probabilistic analysis of a null-space 

characterization. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing 

(ICASSP), 2008. 
[SXH08b] Mihailo Stojnic, Weiyu Xu, and Babak Hassibi. Compressed sensing of approximately sparse signals. In IEEE 

International Symposium on Information Theory, 2008. 
[TWD^06] J. Tropp, M. Wakin, M. Duarte, D. Baron, and R. Baraniuk. Random filters for compressive sampling and 

reconstruction, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 

2006. 
[Vav09] S. Vavasis. Derivation of compressive sensing theorems for the spherical section property. University of Waterloo, 

CO 769 lecture notes, 2009. 
[VS92] A. M. Vershik and R V. Sporyshev. Asymptotic behavior of the number of faces of random polyhedra and the 

neighborliness problem. Selecta Mathematica Sovietica, 11(2): 181-201, 1992. 
[Wai06] M. Wainwright. Sharp thresholds for noisy and high-dimensional recovery of sparsity using ^i -constrained quadratic 

programming. Technical report, UC Berkeley, Department of Statistics, 2006. 
[XH08] Weiyu Xu and Babak Hassibi. Compressed sensing over the grassmann manifold: A unified analytical framework. 

Proceedings of the Forty-Sixth Annual Allerton Conference on Communication, Control, and Computing, 2008. 
[XKAHIO] Weiyu Xu, Amin Khajehnejad, Amir Salman Avestimehr, and Babak Hassibi. Breaking through the thresholds: 

an analysis for iterative reweighted li minimization via the Grassmann angle framework. In Proceedings of the 

International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2010. 
[Zha06] Y. Zhang. When is missing data recoverable? 2006. available online at 

http://www.caam.rice.edu/ zhang/reports/index.html. 
[Zha08] Y. Zhang. Theory of compressive sensing via £i -minimization: a non-rip analysis and extensions. 2008. available 

online at http://www.caam.rice.edu/ zhang/reports/index.html. 



May 21, 2010 DRAFT 



